Attention-based Memory Selection Recurrent Network for Language Modeling
Liu, Da-Rong; Chuang, Shun-Po; Lee, Hung-yi;
Subject: Computer Science - Computation and Language
Recurrent neural networks (RNNs) have achieved great success in language modeling. However, since the RNNs have fixed size of memory, their memory cannot store all the information about the words it have seen before in the sentence, and thus the useful long-term informa... View more
Table 1. The statistics of the three data sets we used in the following experiments. Corpus Lang train dev test jsj jvj PT Eng 40K 3K 4K 21.1 9999 SB Eng 945K 10K 5.2K 10.39 47283 GW Chi 531K 165K 260K 10.79 5123 jsj denotes the average number of words in the sentences. jvj denotes the size of the vocabulary. PT denotes Penn Treebank Corpus. SB denotes Switchboard Corpus. GW denotes Gigaword Corpus.
 Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
 Alex Graves, Greg Wayne, and Ivo Danihelka, “Neural turing machines,” arXiv preprint arXiv:1410.5401, 2014.
 Minh-Thang Luong, Hieu Pham, and Christopher D Manning, “Effective approaches to attention-based neural machine translation,” arXiv preprint arXiv:1508.04025, 2015.
 Trevor Cohn, Cong Duy Vu Hoang, Ekaterina Vymolova, Kaisheng Yao, Chris Dyer, and Gholamreza Haffari, “Incorporating structural alignment biases into an attentional neural translation model,” arXiv preprint arXiv:1601.01085, 2016.
 Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
 Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S Zemel, and Yoshua Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” arXiv preprint arXiv:1502.03044, vol. 2, no. 3, pp. 5, 2015.
 Junqi Jin, Kun Fu, Runpeng Cui, Fei Sha, and Changshui Zhang, “Aligning where to see and what to tell: image caption with region-based attention and scene factorization,” arXiv preprint arXiv:1506.06272, 2015.
 Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola, “Stacked attention networks for image question answering,” arXiv preprint arXiv:1511.02274, 2015.
 Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, and Ram Nevatia, “Abc-cnn: An attention based convolutional neural network for visual question answering,” arXiv preprint arXiv:1511.05960, 2015.