Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type , Preprint , Conference object 01 Jan 2020Embargo end date: 01 Jan 2020Publisher:International Committee on Computational LinguisticsJournal:Proceedings of the 28th International Conference on Computational Linguistics

Authors: Pan Xie; Cui Zhi; Xiuying Chen; Xiaohui Hu; Jianwei Cui; Bin Wang;

doi: 10.18653/v1/2020.coling-main.2 , 10.48550/arxiv.2010.09194 , 10.60692/qts0d-7s742 , 10.60692/eg8sc-pg233

arXiv: 2010.09194

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy. To remedy a flawed translation by non-autoregressive models, a promising approach is to train a conditional masked translation model (CMTM), and refine the generated results within several iterations. Unfortunately, such approach hardly considers the \textit{sequential dependency} among target words, which inevitably results in a translation degradation. Hence, instead of solely training a Transformer-based CMTM, we propose a Self-Review Mechanism to infuse sequential information into it. Concretely, we insert a left-to-right mask to the same decoder of CMTM, and then induce it to autoregressively review whether each generated word from CMTM is supposed to be replaced or kept. The experimental results (WMT14 En$\leftrightarrow$De and WMT16 En$\leftrightarrow$Ro) demonstrate that our model uses dramatically less training computations than the typical CMTM, as well as outperforms several state-of-the-art non-autoregressive models by over 1 BLEU. Through knowledge distillation, our model even surpasses a typical left-to-right Transformer model, while significantly speeding up decoding.

accepted to coling 2020

Related Organizations

Chinese Academy of Sciences
China (People's Republic of)
Chinese Academy of Sciences
Peking University
China (People's Republic of)
Hebei University
China (People's Republic of)
Beihang University
China (People's Republic of)

View all View all

Keywords

Syntax-based Translation Models, FOS: Computer and information sciences, Neural Machine Translation, Artificial intelligence, Translation (biology), Speech recognition, Autoregressive model, Biochemistry, Quantum mechanics, Gene, Visual Question Answering in Images and Videos, FOS: Economics and business, Machine Translation, Artificial Intelligence, FOS: Mathematics, Econometrics, Machine translation, Natural Language Processing, Transformer, Computer Science - Computation and Language, Topic Modeling, Natural language processing, Physics, Messenger RNA, Voltage, Statistical Machine Translation and Natural Language Processing, Computer science, Language Modeling, Algorithm, Chemistry, Computer Science, Physical Sciences, Computation, Dependency (UML), Computer Vision and Pattern Recognition, Decoding methods, Computation and Language (cs.CL), Mathematics

1 Research products, page 1 of 1

mosesdecoder software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average