publication . Article . 2020

A Multilingual Evaluation for Online Hate Speech Detection

Serena Villata; Sara Tonelli; Michele Corazza; Elena Cabrio; Stefano Menini;
Open Access English
  • Published: 25 May 2020
  • Publisher: HAL CCSD
  • Country: France
Abstract
International audience; The increasing popularity of social media platforms like Twitter and Facebook has led to a rise in the presence of hate and aggressive speech on these platforms. Despite the number of approaches recently proposed in the Natural Language Processing research area for detecting these forms of abusive language, the issue of identifying hate speech at scale is still an unsolved problem. In this paper, we propose a robust neural architecture which is shown to perform in a satisfactory way across different languages, namely English, Italian and German. We address an extensive analysis of the obtained experimental results over the three languages...
Subjects
free text keywords: [SCCO.COMP]Cognitive science/Computer science, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Feature selection, Natural language processing, computer.software_genre, computer, Computer science, Social network, business.industry, business, Voice activity detection, Popularity, Social media, Architecture, Long short term memory, Artificial intelligence, German, language.human_language, language
33 references, page 1 of 3

[1] Sweta Agrawal and Amit Awekar. Deep learning for detecting cyberbullying across multiple social media platforms. In Gabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, and Allan Hanbury, editors, Advances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings, volume 10772 of Lecture Notes in Computer Science, pages 141{153. Springer, 2018. [OpenAIRE]

[2] Luis Enrique Argota Vega, Jorge Carlos Reyes-Magan~a, Helena GomezAdorno, and Gemma Bel-Enguix. MineriaUNAM at SemEval-2019 task 5: Detecting hate speech in twitter using multiple features in a combinatorial framework. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 447{452, Minneapolis, Minnesota, USA, June 2019. Association for Computational Linguistics.

[3] Pinar Arslan, Michele Corazza, Elena Cabrio, and Serena Villata. Overwhelmed by Negative Emotions? Maybe You Are Being Cyber-bullied! In SAC 2019 - The 34th ACM/SIGAPP Symposium On Applied Computing, Limassol, Cyprus, April 2019. [OpenAIRE]

[4] Xiaoyu Bai, Flavio Merenda, Claudia Zaghi, Tommaso Caselli, and Malvina Nissim. Rug @ EVALITA 2018: Hate speech detection in italian social media. In Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018) co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Turin, Italy., 2018. [OpenAIRE]

[5] Xiaoyu Bai, Flavio Merenda, Claudia Zaghi, Tommaso Caselli, and Malvina Nissim. Rug at germeval: Detecting o ensive speech in german social media. In Proceedings of GermEval 2018, 14th Conference on Natural Language Processing (KONVENS 2018), 2018.

[9] Elisa Bassignana, Valerio Basile, and Viviana Patti. Hurtlex: A multilingual lexicon of words to hurt. In 5th Italian Conference on Computational Linguistics, CLiC-it 2018, volume 2253, pages 1{6. CEUR-WS, 2018.

[10] Christos Baziotis, Nikos Pelekis, and Christos Doulkeridis. DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 747{754, Vancouver, Canada, August 2017. Association for Computational Linguistics.

[11] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135{146, 2017.

[12] Cristina Bosco, Felice Dell'Orletta, Fabio Poletto, Manuela Sanguinetti, and Maurizio Tesconi. Overview of the EVALITA 2018 hate speech detection task. In Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018) co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Turin, Italy., 2018.

[13] Miguel Angel Alvarez Carmona, Estefan a Guzman-Falcon, Manuel Montes-y-Gomez, Hugo Jair Escalante, Luis Villasen~or Pineda, Veronica Reyes-Meza, and Antonio Rico Sulayes. Overview of MEX-A3T at ibereval 2018: Authorship and aggressiveness analysis in mexican spanish tweets. In Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018., pages 74{96, 2018.

[14] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder{decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724{1734. Association for Computational Linguistics, 2014.

[15] Francois Chollet et al. Keras. https://github.com/fchollet/keras, 2015.

[16] Mark Cieliebak, Jan Milan Deriu, Dominic Egger, and Fatih Uzdilli. A twitter corpus and benchmark resources for german sentiment analysis. In 5th International Workshop on Natural Language Processing for Social Media, Boston, MA, USA, pages 45{51. Association for Computational Linguistics, 2017.

[26] Elisabetta Fersini, Paolo Rosso, and Maria Anzovino. Overview of the task on automatic misogyny identi cation at ibereval 2018. In IberEval@SEPLN, volume 2150 of CEUR Workshop Proceedings, pages 214{228. CEURWS.org, 2018.

[34] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735{1780, November 1997.

33 references, page 1 of 3
Abstract
International audience; The increasing popularity of social media platforms like Twitter and Facebook has led to a rise in the presence of hate and aggressive speech on these platforms. Despite the number of approaches recently proposed in the Natural Language Processing research area for detecting these forms of abusive language, the issue of identifying hate speech at scale is still an unsolved problem. In this paper, we propose a robust neural architecture which is shown to perform in a satisfactory way across different languages, namely English, Italian and German. We address an extensive analysis of the obtained experimental results over the three languages...
Subjects
free text keywords: [SCCO.COMP]Cognitive science/Computer science, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Feature selection, Natural language processing, computer.software_genre, computer, Computer science, Social network, business.industry, business, Voice activity detection, Popularity, Social media, Architecture, Long short term memory, Artificial intelligence, German, language.human_language, language
33 references, page 1 of 3

[1] Sweta Agrawal and Amit Awekar. Deep learning for detecting cyberbullying across multiple social media platforms. In Gabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, and Allan Hanbury, editors, Advances in Information Retrieval - 40th European Conference on IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018, Proceedings, volume 10772 of Lecture Notes in Computer Science, pages 141{153. Springer, 2018. [OpenAIRE]

[2] Luis Enrique Argota Vega, Jorge Carlos Reyes-Magan~a, Helena GomezAdorno, and Gemma Bel-Enguix. MineriaUNAM at SemEval-2019 task 5: Detecting hate speech in twitter using multiple features in a combinatorial framework. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 447{452, Minneapolis, Minnesota, USA, June 2019. Association for Computational Linguistics.

[3] Pinar Arslan, Michele Corazza, Elena Cabrio, and Serena Villata. Overwhelmed by Negative Emotions? Maybe You Are Being Cyber-bullied! In SAC 2019 - The 34th ACM/SIGAPP Symposium On Applied Computing, Limassol, Cyprus, April 2019. [OpenAIRE]

[4] Xiaoyu Bai, Flavio Merenda, Claudia Zaghi, Tommaso Caselli, and Malvina Nissim. Rug @ EVALITA 2018: Hate speech detection in italian social media. In Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018) co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Turin, Italy., 2018. [OpenAIRE]

[5] Xiaoyu Bai, Flavio Merenda, Claudia Zaghi, Tommaso Caselli, and Malvina Nissim. Rug at germeval: Detecting o ensive speech in german social media. In Proceedings of GermEval 2018, 14th Conference on Natural Language Processing (KONVENS 2018), 2018.

[9] Elisa Bassignana, Valerio Basile, and Viviana Patti. Hurtlex: A multilingual lexicon of words to hurt. In 5th Italian Conference on Computational Linguistics, CLiC-it 2018, volume 2253, pages 1{6. CEUR-WS, 2018.

[10] Christos Baziotis, Nikos Pelekis, and Christos Doulkeridis. DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 747{754, Vancouver, Canada, August 2017. Association for Computational Linguistics.

[11] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135{146, 2017.

[12] Cristina Bosco, Felice Dell'Orletta, Fabio Poletto, Manuela Sanguinetti, and Maurizio Tesconi. Overview of the EVALITA 2018 hate speech detection task. In Proceedings of the Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2018) co-located with the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Turin, Italy., 2018.

[13] Miguel Angel Alvarez Carmona, Estefan a Guzman-Falcon, Manuel Montes-y-Gomez, Hugo Jair Escalante, Luis Villasen~or Pineda, Veronica Reyes-Meza, and Antonio Rico Sulayes. Overview of MEX-A3T at ibereval 2018: Authorship and aggressiveness analysis in mexican spanish tweets. In Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018., pages 74{96, 2018.

[14] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder{decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724{1734. Association for Computational Linguistics, 2014.

[15] Francois Chollet et al. Keras. https://github.com/fchollet/keras, 2015.

[16] Mark Cieliebak, Jan Milan Deriu, Dominic Egger, and Fatih Uzdilli. A twitter corpus and benchmark resources for german sentiment analysis. In 5th International Workshop on Natural Language Processing for Social Media, Boston, MA, USA, pages 45{51. Association for Computational Linguistics, 2017.

[26] Elisabetta Fersini, Paolo Rosso, and Maria Anzovino. Overview of the task on automatic misogyny identi cation at ibereval 2018. In IberEval@SEPLN, volume 2150 of CEUR Workshop Proceedings, pages 214{228. CEURWS.org, 2018.

[34] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735{1780, November 1997.

33 references, page 1 of 3
Any information missing or wrong?Report an Issue