publication . Preprint . 2020

Gender Coreference and Bias Evaluation at WMT 2020

Kocmi, Tom; Limisiewicz, Tomasz; Stanovsky, Gabriel;
Open Access English
  • Published: 12 Oct 2020
Abstract
Gender bias in machine translation can manifest when choosing gender inflections based on spurious gender correlations. For example, always translating doctors as men and nurses as women. This can be particularly harmful as models become more popular and deployed within commercial systems. Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian. To achieve this, we use WinoMT, a recent automatic test suite which examines gender coreference and bias when translating from English to languages with grammatical gender. We extend WinoMT to handle two ...
Subjects
free text keywords: Computer Science - Computation and Language
Funded by
EC| Bergamot
Project
Bergamot
Browser-based Multilingual Translation
  • Funder: European Commission (EC)
  • Project Code: 825303
  • Funding stream: H2020 | RIA
Communities
CLARIN
Digital Humanities and Cultural HeritageDH-CH communities: CLARIN
Download from

Loïc Barrault, Magdalena Biesialska, Ondrˇej Bojar, Marta R. Costa-jussà, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Tom Kocmi, Philipp Koehn, Nikola Ljubešic´, Christof Monz, Mathias Müller, Santanu Pal, Matt Post, and Marcos Zampieri. 2020. Findings of the 2020 conference on machine translation (wmt20). In Proceedings of the Fifth Conference on Machine Translation, Volume 2: Shared Task Papers. Association for Computational Linguistics.

Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. A simple, fast, and effective reparameterization of IBM model 2. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 644-648, Atlanta, Georgia. Association for Computational Linguistics.

Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 311-318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.

Matt Post. 2018. A call for clarity in reporting bleu scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186-191, Belgium, Brussels. Association for Computational Linguistics.

Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benjamin Van Durme. 2018. Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301.

Gabriel Stanovsky, Noah A. Smith, and Luke Zettlemoyer. 2019. Evaluating gender bias in machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679-1684, Florence, Italy. Association for Computational Linguistics.

Jana Straková, Milan Straka, and Jan Hajicˇ. 2014. Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 13-18, Baltimore, Maryland. Association for Computational Linguistics.

Ryszard Tuora and Łukasz Kobylin´ ski. 2019. Integrating Polish language tools and resources in Spacy. In Proceedings of PP-RAI 2019 Conference, pages 210-214, Wrocław. Department of Systems and Computer Networks, Faculty of Electronics, Wroclaw University of Science and Technology.

Marcin Wolin´ ski and Witold Kieras´. 2016. The on-line version of grammatical dictionary of polish. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2589-2594, Portorož, Slovenia. European Language Resources Association (ELRA).

Marcin Wolin´ ski. 2014. Morfeusz reloaded. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, pages 1106-1111, Reykjavík, Iceland. European Language Resources Association (ELRA).

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018a. Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876.

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018b. Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15-20, New Orleans, Louisiana. Association for Computational Linguistics.

Any information missing or wrong?Report an Issue