Adapting Multilingual Neural Machine Translation to Unseen Languages

descriptionPublicationkeyboard_double_arrow_right Conference object , Article , Preprint , Other literature type 01 Jan 2019Embargo end date: 01 Jan 2019 English Publisher:ZenodoJournal:CoRR, volume abs/1910.13998

Authors: Surafel Melaku Lakew; Alina Karakanta; Marcello Federico; Matteo Negri; Marco Turchi;

doi: 10.5281/zenodo.3525486 , 10.48550/arxiv.1910.13998 , 10.5281/zenodo.3525485

arXiv: 1910.13998

Adapting Multilingual Neural Machine Translation to Unseen Languages

- Summary
- Subjects
- Metrics

Abstract

Multilingual Neural Machine Translation (MNMT) for low-resource languages (LRL) can be enhanced by the presence of related high-resource languages (HRL), but the relatedness of HRL usually relies on predefined linguistic assumptions about language similarity. Recently, adapting MNMT to a LRL has shown to greatly improve performance. In this work, we explore the problem of adapting an MNMT model to an unseen LRL using data selection and model adaptation. In order to improve NMT for LRL, we employ perplexity to select HRL data that are most similar to the LRL on the basis of language distance. We extensively explore data selection in popular multilingual NMT settings, namely in (zero-shot) translation, and in adaptation from a multilingual pre-trained model, for both directions (LRL-en). We further show that dynamic adaptation of the model's vocabulary results in a more favourable segmentation for the LRL in comparison with direct adaptation. Experiments show reductions in training time and significant performance gains over LRL baselines, even with zero LRL data (+13.0 BLEU), up to +17.0 BLEU for pre-trained multilingual model dynamic adaptation with related data selection. Our method outperforms current approaches, such as massively multilingual models and data augmentation, on four LRL.

Accepted at the 16th International Workshop on Spoken Language Translation (IWSLT), November, 2019

Related Organizations

Fondazione Bruno Kessler
Italy
University of Trento
Italy

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average