
In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language.
FOS: Computer and information sciences, Computer engineering. Computer hardware, Artificial intelligence, Offensive Language, Cybercrime and Dark Web Activities, NLP, Online Harassment, Cyberbullying, Systems engineering, TK7885-7895, TA168, Identification (biology), Characterization and Detection of Android Malware, Engineering, Artificial Intelligence, Automated Detection of Hate Speech and Offensive Language, transformers models, natural language processing, Biology, Transformer, large language modeling, Natural language processing, Botany, deep learning, Deep learning, Voltage, Computer science, Electrical engineering, Computer Science, Physical Sciences, Signal Processing, Information Systems
FOS: Computer and information sciences, Computer engineering. Computer hardware, Artificial intelligence, Offensive Language, Cybercrime and Dark Web Activities, NLP, Online Harassment, Cyberbullying, Systems engineering, TK7885-7895, TA168, Identification (biology), Characterization and Detection of Android Malware, Engineering, Artificial Intelligence, Automated Detection of Hate Speech and Offensive Language, transformers models, natural language processing, Biology, Transformer, large language modeling, Natural language processing, Botany, deep learning, Deep learning, Voltage, Computer science, Electrical engineering, Computer Science, Physical Sciences, Signal Processing, Information Systems
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 10 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
