publication . Article . 2021

Revisiting Multi-Domain Machine Translation

Pham, Minh Quang; Crego, Josep-Maria; Yvon, François;
Open Access English
  • Published: 12 Feb 2021
  • Publisher: Zenodo
  • Country: France
Abstract
International audience; When building machine translation systems, one often needs to make the best out of heterogeneous sets of parallel data in training, and to robustly handle inputs from unexpected domains in testing. This multi-domain scenario has attracted a lot of recent work, that fall under the general umbrella of transfer learning. In this study, we revisit multi-domain machine translation, with the aim to formulate the motivations for developing such systems and the associated expectations with respect to performance. Our experiments with a large sample of multi-domain systems show that most of these expectations are hardly met and suggest that furthe...
Subjects
free text keywords: Neural Machine Translation, Multi-domain MT, Domain Adaptation, Machine Translation, [INFO]Computer Science [cs], [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Related Organizations
Funded by
EC| ANITA
Project
ANITA
Advanced tools for fighting oNline Illegal TrAfficking
  • Funder: European Commission (EC)
  • Project Code: 787061
  • Funding stream: H2020 | RIA
77 references, page 1 of 6

Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen, and Yonghui Wu. 2019. Massively multilingual neural machine translation in the wild: Findings and challenges. arXiv e-prints, abs/1907.05019. [OpenAIRE]

Amittai Axelrod, Xiaodong He, and Jianfeng Gao. 2011. Domain adaptation via pseudo in-domain data selection. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11, pages 355-362. Edinburgh, United Kingdom.

Pratyush Banerjee, Jinhua Du, Baoli Li, Sudip Kumar Naskar, Andy Way, and Josef van Genabith. 2010. Combining multi-domain statistical machine translation models using automatic classifiers. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas, AMTA 2010. Denver, CO, USA.

Ankur Bapna and Orhan Firat. 2019. Simple, scalable adaptation for neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, pages 1538-1548, Hong Kong, China. Association for Computational Linguistics. DOI: https://doi.org /10.18653/v1/D19-1165 [OpenAIRE]

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jenn Wortman. 2010. A theory of learning from different domains. Machine Learning, 79(1): 151-175. DOI: https://doi.org/10 .1007/s10994-009-5152-4

Nicola Bertoldi and Marcello Federico. 2009. Domain adaptation for statistical machine translation with monolingual resources. In Proceedings of the Fourth Workshop on Statistical Machine Translation, pages 182-189, Athens, Greece. Association for Computational Linguistics. DOI: https://doi.org /10.3115/1626431.1626468

John Blitzer. 2007. Domain Adaptation of Natural Language Processing Systems. Ph.D. thesis, School of Computer Science, University of Pennsylvania.

Denny Britz, Quoc Le, and Reid Pryzant. 2017. Effective domain mixing for neural machine translation. In Proceedings of the Second Conference on Machine Translation, pages 118-126, Copenhagen, Denmark. Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/W17 -4712

Mauro Cettolo, Christian Girardi, and Marcello Federico. 2012. Wit3: Web inventory of transcribed and translated talks. In Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT), pages 261-268. Trento, Italy.

Wenhu Chen, Evgeny Matusov, Shahram Khadivi, and Jan-Thorsten Peter. 2016. Guided alignment training for topic-aware neural machine translation. In Proceedings of the Twelth Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2012. Austin, Texas. [OpenAIRE]

Chenhui Chu and Raj Dabre. 2018. Multilingual and multi-domain adaptation for neural machine translation. In Proceedings of the 24st Annual Meeting of the Association for Natural Language Processing, NLP 2018, pages 909-912, Okayama, Japan. [OpenAIRE]

Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine [OpenAIRE]

translation. In Proceedings of the 27th Inter-

guistics, COLING 2018, pages 1304-1319,

Jonathan H. Clark, Alon Lavie, and Chris Dyer. 2012. One system, many domains: Opendomain statistical machine translation via feature augmentation. In Proceedings of the Tenth Biennial Conference of the Association for Machine Translation in the Americas, (AMTA 2012). San Diego, CA.

77 references, page 1 of 6
Abstract
International audience; When building machine translation systems, one often needs to make the best out of heterogeneous sets of parallel data in training, and to robustly handle inputs from unexpected domains in testing. This multi-domain scenario has attracted a lot of recent work, that fall under the general umbrella of transfer learning. In this study, we revisit multi-domain machine translation, with the aim to formulate the motivations for developing such systems and the associated expectations with respect to performance. Our experiments with a large sample of multi-domain systems show that most of these expectations are hardly met and suggest that furthe...
Subjects
free text keywords: Neural Machine Translation, Multi-domain MT, Domain Adaptation, Machine Translation, [INFO]Computer Science [cs], [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Related Organizations
Funded by
EC| ANITA
Project
ANITA
Advanced tools for fighting oNline Illegal TrAfficking
  • Funder: European Commission (EC)
  • Project Code: 787061
  • Funding stream: H2020 | RIA
77 references, page 1 of 6

Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen, and Yonghui Wu. 2019. Massively multilingual neural machine translation in the wild: Findings and challenges. arXiv e-prints, abs/1907.05019. [OpenAIRE]

Amittai Axelrod, Xiaodong He, and Jianfeng Gao. 2011. Domain adaptation via pseudo in-domain data selection. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11, pages 355-362. Edinburgh, United Kingdom.

Pratyush Banerjee, Jinhua Du, Baoli Li, Sudip Kumar Naskar, Andy Way, and Josef van Genabith. 2010. Combining multi-domain statistical machine translation models using automatic classifiers. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas, AMTA 2010. Denver, CO, USA.

Ankur Bapna and Orhan Firat. 2019. Simple, scalable adaptation for neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, pages 1538-1548, Hong Kong, China. Association for Computational Linguistics. DOI: https://doi.org /10.18653/v1/D19-1165 [OpenAIRE]

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jenn Wortman. 2010. A theory of learning from different domains. Machine Learning, 79(1): 151-175. DOI: https://doi.org/10 .1007/s10994-009-5152-4

Nicola Bertoldi and Marcello Federico. 2009. Domain adaptation for statistical machine translation with monolingual resources. In Proceedings of the Fourth Workshop on Statistical Machine Translation, pages 182-189, Athens, Greece. Association for Computational Linguistics. DOI: https://doi.org /10.3115/1626431.1626468

John Blitzer. 2007. Domain Adaptation of Natural Language Processing Systems. Ph.D. thesis, School of Computer Science, University of Pennsylvania.

Denny Britz, Quoc Le, and Reid Pryzant. 2017. Effective domain mixing for neural machine translation. In Proceedings of the Second Conference on Machine Translation, pages 118-126, Copenhagen, Denmark. Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/W17 -4712

Mauro Cettolo, Christian Girardi, and Marcello Federico. 2012. Wit3: Web inventory of transcribed and translated talks. In Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT), pages 261-268. Trento, Italy.

Wenhu Chen, Evgeny Matusov, Shahram Khadivi, and Jan-Thorsten Peter. 2016. Guided alignment training for topic-aware neural machine translation. In Proceedings of the Twelth Biennial Conference of the Association for Machine Translation in the Americas, AMTA 2012. Austin, Texas. [OpenAIRE]

Chenhui Chu and Raj Dabre. 2018. Multilingual and multi-domain adaptation for neural machine translation. In Proceedings of the 24st Annual Meeting of the Association for Natural Language Processing, NLP 2018, pages 909-912, Okayama, Japan. [OpenAIRE]

Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine [OpenAIRE]

translation. In Proceedings of the 27th Inter-

guistics, COLING 2018, pages 1304-1319,

Jonathan H. Clark, Alon Lavie, and Chris Dyer. 2012. One system, many domains: Opendomain statistical machine translation via feature augmentation. In Proceedings of the Tenth Biennial Conference of the Association for Machine Translation in the Americas, (AMTA 2012). San Diego, CA.

77 references, page 1 of 6
Any information missing or wrong?Report an Issue