S. Almohaimeed, S. Almohaimeed, D. Turgut, and L. Bölö. Closest Positive Cluster Loss: Improving the Generalization of Implicit Hate Speech Classifiers Across Social Media Datasets. In Accepted to appear at IEEE ICC 2025, June 2025.
Flagging hate speech on social media messages has important societal benefits. While large language models have became increasingly able to identify hate speech with high accuracy, they come with a significant computational cost. Thus, there is a need for simpler models that detect hateful or abusive content by classifying an embedding of the text. Such models perform very well for explicitly abusive content, but struggle with the classification of implicit hate. Furthermore, it has been found that the performance decreases significantly in cross-dataset experiments. In this paper, we propose Closest Positive Cluster (CPC) an auxiliary loss that increases the generalizability of embedding-based explicit and implicit hate classifiers in crossdataset scenarios. Through experiments spanning ten different hate speech datasets, we found that the CPC loss increased the model performance by 0.17 - 7.4% when added to the binary crossentropy loss during training. The experiments also investigated whether models trained on specific hate speech datasets generalize better to other datasets.
@inproceedings{Almohaimeed-2025-ICC, author = "S. Almohaimeed and S. Almohaimeed and D. Turgut and L. B{\"o}l{\"o}", title = "Closest Positive Cluster Loss: Improving the Generalization of Implicit Hate Speech Classifiers Across Social Media Datasets", booktitle = "Accepted to appear at IEEE ICC 2025", year = "2025", month = "June", abstract = {Flagging hate speech on social media messages has important societal benefits. While large language models have became increasingly able to identify hate speech with high accuracy, they come with a significant computational cost. Thus, there is a need for simpler models that detect hateful or abusive content by classifying an embedding of the text. Such models perform very well for explicitly abusive content, but struggle with the classification of implicit hate. Furthermore, it has been found that the performance decreases significantly in cross-dataset experiments. In this paper, we propose Closest Positive Cluster (CPC) an auxiliary loss that increases the generalizability of embedding-based explicit and implicit hate classifiers in crossdataset scenarios. Through experiments spanning ten different hate speech datasets, we found that the CPC loss increased the model performance by 0.17 - 7.4% when added to the binary crossentropy loss during training. The experiments also investigated whether models trained on specific hate speech datasets generalize better to other datasets.}, }
Generated by bib2html.pl (written by Patrick Riley, Lotzi Boloni ) on Sat Jan 18, 2025 11:48:03