publication . Conference object . Preprint . 2019

Towards one-shot learning for rare-word translation with external experts

Ngoc-Quan Pham; Jan Niehues; Alex Waibel;
Open Access
  • Published: 29 Jun 2019
  • Publisher: Association for Computational Linguistics
Abstract
Neural machine translation (NMT) has significantly improved the quality of automatic translation models. One of the main challenges in current systems is the translation of rare words. We present a generic approach to address this weakness by having external models annotate the training data as Experts, and control the model-expert interaction with a pointer network and reinforcement learning. Our experiments using phrase-based models to simulate Experts to complement neural machine translation models show that the model can be trained to copy the annotations into the output consistently. We demonstrate the benefit of our proposed framework in outof-domain translation scenarios with only lexical resources, improving more than 1.0 BLEU point in both translation directions English to Spanish and German to English
Comment: 2nd Workshop on Neural Machine Translation and Generation, ACL 2018
Persistent Identifiers
Subjects
free text keywords: Computer Science - Computation and Language, BLEU, Computer science, Automatic translation, One-shot learning, Phrase, Machine translation, computer.software_genre, computer, Reinforcement learning, Natural language processing, Artificial intelligence, business.industry, business, Training set
Communities
Communities with gateway
OpenAIRE Connect image
40 references, page 1 of 3

Philip Arthur, Graham Neubig, and Satoshi Nakamura. 2016. Incorporating discrete translation lexicons into neural machine translation. arXiv preprint arXiv:1606.02006 .

D. Bahdanau, K. Cho, and Y. Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473.

Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2016. An actor-critic algorithm for sequence prediction. arXiv preprint arXiv:1607.07086 . [OpenAIRE]

Yoshua Bengio, Re´jean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research 3(Feb):1137-1155.

M. Cettolo, C. Girardi, and M. Federico. 2012. Wit: Web inventory of transcribed and translated talks. In Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT). Trento, Italy, pages 261-268.

Michael Denkowski and Graham Neubig. 2017. Stronger baselines for trustable results in neural machine translation. arXiv preprint arXiv:1706.09733 .

Jiatao Gu, Zhengdong Lu, Hang Li, and Victor OK Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393 . [OpenAIRE]

Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bowen Zhou, and Yoshua Bengio. 2016. Pointing the unknown words. arXiv preprint arXiv:1603.08148 . [OpenAIRE]

S. Hochreiter and J. Schmidhuber. 1997. Long shortterm memory. Neural Comput. 9(8):1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735.

Hakan Inan, Khashayar Khosravi, and Richard Socher. 2016. Tying word vectors and word classifiers: A loss framework for language modeling. arXiv preprint arXiv:1611.01462 .

Lukasz Kaiser, Ofir Nachum, Aurko Roy, and Samy Bengio. 2017. Learning to remember rare events. CoRR abs/1703.03129. http://arxiv.org/abs/1703.03129. [OpenAIRE]

Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In EMNLP. volume 3, page 413. [OpenAIRE]

Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 .

Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In MT summit. volume 5, pages 79-86.

Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872 .

40 references, page 1 of 3
Abstract
Neural machine translation (NMT) has significantly improved the quality of automatic translation models. One of the main challenges in current systems is the translation of rare words. We present a generic approach to address this weakness by having external models annotate the training data as Experts, and control the model-expert interaction with a pointer network and reinforcement learning. Our experiments using phrase-based models to simulate Experts to complement neural machine translation models show that the model can be trained to copy the annotations into the output consistently. We demonstrate the benefit of our proposed framework in outof-domain translation scenarios with only lexical resources, improving more than 1.0 BLEU point in both translation directions English to Spanish and German to English
Comment: 2nd Workshop on Neural Machine Translation and Generation, ACL 2018
Persistent Identifiers
Subjects
free text keywords: Computer Science - Computation and Language, BLEU, Computer science, Automatic translation, One-shot learning, Phrase, Machine translation, computer.software_genre, computer, Reinforcement learning, Natural language processing, Artificial intelligence, business.industry, business, Training set
Communities
Communities with gateway
OpenAIRE Connect image
40 references, page 1 of 3

Philip Arthur, Graham Neubig, and Satoshi Nakamura. 2016. Incorporating discrete translation lexicons into neural machine translation. arXiv preprint arXiv:1606.02006 .

D. Bahdanau, K. Cho, and Y. Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473.

Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2016. An actor-critic algorithm for sequence prediction. arXiv preprint arXiv:1607.07086 . [OpenAIRE]

Yoshua Bengio, Re´jean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research 3(Feb):1137-1155.

M. Cettolo, C. Girardi, and M. Federico. 2012. Wit: Web inventory of transcribed and translated talks. In Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT). Trento, Italy, pages 261-268.

Michael Denkowski and Graham Neubig. 2017. Stronger baselines for trustable results in neural machine translation. arXiv preprint arXiv:1706.09733 .

Jiatao Gu, Zhengdong Lu, Hang Li, and Victor OK Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393 . [OpenAIRE]

Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bowen Zhou, and Yoshua Bengio. 2016. Pointing the unknown words. arXiv preprint arXiv:1603.08148 . [OpenAIRE]

S. Hochreiter and J. Schmidhuber. 1997. Long shortterm memory. Neural Comput. 9(8):1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735.

Hakan Inan, Khashayar Khosravi, and Richard Socher. 2016. Tying word vectors and word classifiers: A loss framework for language modeling. arXiv preprint arXiv:1611.01462 .

Lukasz Kaiser, Ofir Nachum, Aurko Roy, and Samy Bengio. 2017. Learning to remember rare events. CoRR abs/1703.03129. http://arxiv.org/abs/1703.03129. [OpenAIRE]

Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In EMNLP. volume 3, page 413. [OpenAIRE]

Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 .

Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In MT summit. volume 5, pages 79-86.

Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872 .

40 references, page 1 of 3
Any information missing or wrong?Report an Issue