Defending Digital Discourse: Developing a Toxic Comment Classifier for Fostering Healthy Online Communities

International Journal of Computer Science and Engineering
© 2024 by SSRG - IJCSE Journal
Volume 11 Issue 10
Year of Publication : 2024
Authors : Naimul Hasan Shadesh, Jahangir Hussen, Zannatul Ferdous

pdf
How to Cite?

Naimul Hasan Shadesh, Jahangir Hussen, Zannatul Ferdous, "Defending Digital Discourse: Developing a Toxic Comment Classifier for Fostering Healthy Online Communities," SSRG International Journal of Computer Science and Engineering , vol. 11,  no. 10, pp. 29-39, 2024. Crossref, https://doi.org/10.14445/23488387/IJCSE-V11I10P104

Abstract:

To an extent, trolls or abusive users tend to penetrate the online community and ruin the potential healthy interactions that members and users can have; they over-engage members in the virtual space. In this regard, our work aims to develop models for the automatic detection and classification of toxic comments. The study is divided into four stages or executed in four steps. The first step is data preparation, which is done in stages; the data is loaded and preprocessed. The second step comprises Exploratory Data Analysis (EDA), where we seek to describe the toxic labels in the data and how they vary. The text is then standardized using text preprocessing techniques such as lower casing and punctuation removal before model training. For the model training tasks, logistic regression and Naive Bayes models are used to label each category of the toxicity classifier. It was observed that more than 96% of accuracy is achieved across varied categories: 96.9% of toxic comments, 97.2% of severe toxicity, 97.7% of obscenity, 98.9% of threats, 97.1% of insults, and 96.9% of identity hate. The models were very robust; the whole work took only 2 minutes and 58.24 seconds, which is an indication of its effectiveness and scalability.

Keywords:

Dataset, Exploratory data analysis, Text preprocessing, CNN, Logistic regression, Naive Bayes, Model training, Evaluation, Accuracy, Threat detection, Insult detection, Identity hate detection, Efficiency.

References:

[1] Ellery Wulczyn, Nithum Thain, and Lucas Dixon, “Ex Machina: Personal Attacks Seen at Scale,” Proceedings of the 26th International Conference on World Wide Web, Perth Australia, pp. 1391-1399, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Joni Salminen et al., “Developing an Online Hate Classifier for Multiple Social Media Platforms,” Human-centric Computing and Information Sciences, vol. 10, pp. 1-34, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[3] K. Govinda, and Korhan Cengiz, Toxic Comment Classifier, 1st ed., Hybridization of Blockchain and Cloud Computing, Apple Academic Press, pp. 1-20, 2023.
[Google Scholar] [Publisher Link]
[4] Zhang, Xiang, Junbo Zhao, and Yann LeCun, “Character-Level Convolutional Networks for Text Classification,” arXiv, pp. 1-9, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Thomas Davidson et al., “Automated Hate Speech Detection and the Problem of Offensive Language,” Proceedings of the International AAAI Conference on Web and Social Media, vol. 11, no. 1, pp. 512-515, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Pete Burnap, and Matthew L. Williams, “Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making,” Policy & Internet, vol. 7, no. 2, pp. 223-242, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Ying Chen et al., “Detecting Offensive Language in Social Media to Protect Adolescent Online Safety,” 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, Amsterdam, Netherlands, pp. 71-80, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Despoina Chatzakou et al., “Mean Birds: Detecting Aggression and Bullying on Twitter,” Proceedings of the 2017 ACM on Web Science Conference, Troy New York USA, pp. 13-22, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Ritesh Kumar et al., “Proceedings of The First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018),” Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), 2018.
[Google Scholar] [Publisher Link]
[10] Paula Fortuna, and Sérgio Nunes, “A Survey on Automatic Detection of Hate Speech in Text,” ACM Computing Surveys, vol. 51, no. 4, pp. 1-30, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Progya Paromita Urmee et al., “Real-Time Bangla Sign Language Detection Using Xception Model with Augmented Dataset,” 2019 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), Bangalore, India, pp. 1-5, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Manoel Horta Ribeiro et al., “Auditing Radicalization Pathways on YouTube,” Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona Spain, pp. 131-141, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Pinkesh Badjatiya et al., “Deep Learning for Hate Speech Detection in Tweets,” Proceedings of the 26th International Conference on World Wide Web Companion, Perth Australia, pp. 759-760, 2017.
[CrossRef] [Google Scholar] [Publisher Link]