Hate Speech Detection Using Ensemble Approach and Embedding Technique

Bira Alam; Fatima Abbas; Nafees Ayub

doi:10.64060/ICPP.02

Authors

Bira Alam Riphah International University Faisalabad, Pakistan Author https://orcid.org/0009-0004-9065-3984
Fatima Abbas Riphah International University Faisalabad, Pakistan Author https://orcid.org/0009-0000-2405-7392
Nafees Ayub Government College University Faisalabad, Pakistan Author https://orcid.org/0000-0003-1200-3308

DOI:

https://doi.org/10.64060/ICPP.02

Keywords:

Hate Speech Detection, GloV, Ensemble Learning, Word Embeddings, Ethos Dataset, Machine Learning

Abstract

The rising trend of hate speech on the internet is being major concern to the internet security and societal coexistence, which will requires efficient automated detection tools. As much as different machine learning strategies have been suggested, the issue of the high accuracy using limited data still remains a challenge. This paper introduces a collective detecting frame of hate speech that was tested on the Ethos Binary dataset. Various machine learning classifiers such as K-Nearest Neighbor, Naive Bayes, Logistic Regression, and Decision Tree are used with pre-trained GloVe word embeddings in order to extract semantic representations of textual data. The models are also trained and tested in various hyperparameter configurations, to be robust. The experimental findings indicate that Decision Tree classifier is much better than other models, with a precision of 87, recall of 93, F1-score of 90 and an overall accuracy of 91. The results have shown that an ensemble learning approach with embedding techniques has the potential to greatly increase the performance of hate speech detection. This work helps to enhance viable and scalable solutions to the content of moderating dangerous content online.

Downloads

Download data is not yet available.

References

[1] S. Mukherjee and S. Das, “Application of Transformer-Based Language Models to Detect Hate Speech in Social Media,” vol. 2, no. December 2021, pp. 278–286, 2023, doi: 10.47852/bonviewJCCE2022010102.

[2] F. Alkomah and X. Ma, “A Literature Review of Textual Hate Speech Detection Methods and Datasets,” Inf., vol. 13, no. 6, pp. 1–22, 2022, doi: 10.3390/info13060273.

[3] A. Haque, “Hate Speech Detection in Social Media Using the Ensemble Learning Technique Hate Speech Detection in Social Media Using the Ensemble Learning Technique,” 2023.

[4] K. U. Wijaya and E. B. Setiawan, “Hate Speech Detection Using Convolutional Neural Network and Gated Recurrent Unit with FastText Feature Expansion on Twitter,” vol. 9, no. 3, pp. 619–631, 2023, doi: 10.26555/jiteki.v9i3.26532.

[5] A. K. Das, A. Al Asif, A. Paul, and M. N. Hossain, “Bangla hate speech detection on social media using attention-based recurrent neural network,” J. Intell. Syst., vol. 30, no. 1, pp. 578–591, 2021, doi: 10.1515/jisys-2020-0060.

[6] F. E. Ayo, O. Folorunso, F. T. Ibharalu, I. A. Osinuga, and A. Abayomi-Alli, “A probabilistic clustering model for hate speech classification in twitter,” Expert Syst. Appl., vol. 173, no. February, p. 114762, 2021, doi: 10.1016/j.eswa.2021.114762.

[7] D. Mody, Y. D. Huang, and T. E. Alves de Oliveira, “A curated dataset for hate speech detection on social media text,” Data Br., vol. 46, p. 108832, 2023, doi: 10.1016/j.dib.2022.108832.

[8] M. Almaliki, A. M. Almars, I. Gad, and E. S. Atlam, “ABMM: Arabic BERT-Mini Model for Hate-Speech Detection on Social Media,” Electron., vol. 12, no. 4, pp. 1–16, 2023, doi: 10.3390/electronics12041048.

[9] R. T. Mutanga, N. Naicker, and O. O. Olugbara, “Detecting Hate Speech on Twitter Network using Ensemble Machine Learning,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 3, pp. 331–339, 2022, doi: 10.14569/IJACSA.2022.0130341.

[10] Z. Al-Makhadmeh and A. Tolba, “Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach,” Computing, vol. 102, no. 2, pp. 501–522, 2020, doi: 10.1007/s00607-019-00745-0.

[11] R. Alshalan and H. Al-Khalifa, “A deep learning approach for automatic hate speech detection in the saudi twittersphere,” Appl. Sci., vol. 10, no. 23, pp. 1–16, 2020, doi: 10.3390/app10238614.

[12] C. Paul and P. Bora, “Detecting Hate Speech using Deep Learning Techniques,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 2, pp. 619–623, 2021, doi: 10.14569/IJACSA.2021.0120278.

[13] T. Tiţa and A. Zubiaga, “Cross-lingual Hate Speech Detection using Transformer Models,” arXiv Prepr. arXiv2111.00981, 2021, [Online]. Available: https://arxiv.org/abs/2111.00981%0Ahttps://arxiv.org/pdf/2111.00981

[14] E. Mahajan, H. Mahajan, and S. Kumar, “EnsMulHateCyb: Multilingual hate speech and cyberbully detection in online social media,” Expert Syst. Appl., vol. 236, no. May 2023, p. 121228, 2024, doi: 10.1016/j.eswa.2023.121228.

[15] A. Chhabra and D. K. Vishwakarma, “A Truncated SVD Framework for Online Hate Speech Detection on the ETHOS Dataset,” 2023 Int. Conf. Innov. Trends Inf. Technol. ICITIIT 2023, pp. 1–4, 2023, doi: 10.1109/ICITIIT57246.2023.10068574.