publication . Conference object . Preprint . 2016

On training bi-directional neural network language model with noise contrastive estimation

Tianxing He; Yu Zhang; Jasha Droppo; Kai Yu;
Open Access
  • Published: 19 Feb 2016
  • Publisher: IEEE
Although uni-directional recurrent neural network language model(RNNLM) has been very successful, it's hard to train a bi-directional RNNLM properly due to the generative nature of language model. In this work, we propose to train bi-directional RNNLM with noise contrastive estimation(NCE), since the properities of NCE training will help the model to acheieve sentence-level normalization. Experiments are conducted on two hand-crafted tasks on the PTB data set: a rescore task and a sanity test. Although(regretfully), the model trained by NCE did not out-perform the baseline uni-directional NNLM, it is shown that NCE-trained bi-directional NNLM behaves well in the...
Persistent Identifiers
free text keywords: Computer Science - Computation and Language, Normalization (statistics), Generative grammar, Speech recognition, Language model, Natural language, Maximum likelihood, Context model, Artificial intelligence, business.industry, business, Recurrent neural network, Computer science, Artificial neural network
Related Organizations
25 references, page 1 of 2

[1] Stanley F. Chen and Joshua Goodman, “An empirical study of smoothing techniques for language modeling,” in Proc. ACL. 1996, pp. 310-318, Association for Computational Linguistics.

[2] Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Jauvin, “A neural probabilistic language model,” Journal OF Machine Learning Research, vol. 3, pp. 1137-1155, 2003.

[3] Holger Schwenk, “Continuous space language models,” Computer Speech Language, vol. 21, no. 3, pp. 492-518, 2007.

[4] Frederic Morin and Yoshua Bengio, “Hierarchical probabilistic neural network language model,” in AISTATS, 2005, pp. 246-252.

[5] J. Park, X. Liu, M. J. F. Gales, and P. C. Woodland, “Improved neural network based language modelling andadaptation,” in Proc. InterSpeech, 2010.

[6] Andriy Mnih and Geoffrey Hinton, “Three new graphical models for statistical language modelling,” in Proc. ICML, 2007, pp. 641-648. [OpenAIRE]

[7] Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur, “Recurrent neural network based language model,” in Proc. InterSpeech, 2010.

[8] Martin Sundermeyer, Ilya Oparin, Ben Freiberg, Ralf Schlter, and Hermann Ney, “Comparison of feedforward and recurrent neural network language models,” in Proc. ICASSP, 2013.

[9] Martin Sundermeyer, Ralf Schluter, and Hermann Ney, “Lstm neural networks for language modeling,” in Proc. InterSpeech, 2012.

[10] Zhiheng Huang, Geoffrey Zweig, and Benoit Dumoulin, “Cache based recurrent neural network language model inference for first pass speech recognition,” in Proc. ICASSP, 2014.

[11] X. Liu, Y. Wang, X. Chen, M. J. F. Gales, and P. C. Woodland, “Efficient lattice rescoring using recurrent neural network language models,” in Proc. ICASSP, 2014.

[12] X. Chen, Y. Wang, X. Liu, M.J.F. Gales, and P. C. Woodland, “Efficient gpu-based training of recurrent neural network language models using spliced sentence bunch,” in Proc. InterSpeech, 2014.

[13] Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and Jrgen Schmidhuber, “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies,” 2001.

[14] Sepp Hochreiter and Ju¨rgen Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735-1780, Nov. 1997.

[15] Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals, “Recurrent neural network regularization,” CoRR, vol. abs/1409.2329, 2014.

25 references, page 1 of 2
Any information missing or wrong?Report an Issue