publication . Preprint . 2012

On the difficulty of training Recurrent Neural Networks

Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua;
Open Access English
  • Published: 21 Nov 2012
Abstract
Comment: Improved description of the exploding gradient problem and description and analysis of the vanishing gradient problem
Subjects
ACM Computing Classification System: MathematicsofComputing_NUMERICALANALYSIS
free text keywords: Computer Science - Learning
Download from
19 references, page 1 of 2

Atiya, A. F. and Parlos, A. G. (2000). New results on recurrent network training: Unifying the algorithms and accelerating convergence. IEEE Trans. Neural Networks , 11, 697{709.

Bengio, Y., Frasconi, P., and Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. pages 1183{1195, San Francisco. IEEE Press. (invited paper).

Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is di cult. IEEE Transactions on Neural Networks, 5(2), 157{166. [OpenAIRE]

Boulanger-Lewandowski, N., Bengio, Y., and Vincent, P. (2012). Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In Proceedings of the Twenty-nine International Conference on Machine Learning (ICML'12). ACM.

Doya, K. (1993). Bifurcations of recurrent neural networks in gradient descent learning. IEEE Transactions on Neural Networks , 1, 75{80.

Doya, K. and Yoshizawa, S. (1991). Adaptive synchronization of neural and physical oscillators. In J. E. Moody, S. J. Hanson, and R. Lippmann, editors, NIPS , pages 109{116. Morgan Kaufmann.

Duchi, J. C., Hazan, E., and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121{2159.

Elman, J. (1990). Finding structure in time. Cognitive Science, 14(2), 179{211. [OpenAIRE]

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735{1780.

Lukosevicius, M. and Jaeger, H. (2009). Reservoir computing approaches to recurrent neural network training. Computer Science Review , 3(3), 127{149. [OpenAIRE]

Mikolov, T. (2012). Statistical Language Models based on Neural Networks. Ph.D. thesis, Brno University of Technology.

Williams, R. J. and Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural Comput., 1, 270{280. [OpenAIRE]

Mikolov, T., Sutskever, I., Deoras, A., Le, H.-S., Kombrink, S., and Cernocky, J. (2012). Subword language modeling with neural networks. preprint (http://www. t.vutbr.cz/ imikolov/rnnlm/char.pdf).

Moreira, M. and Fiesler, E. (1995). Neural networks with adaptive learning rate and momentum terms. Idiap-RR Idiap-RR-04-1995, IDIAP, Martigny, Switzerland.

Pascanu, R. and Jaeger, H. (2011). A neurodynamical model for working memory. Neural Netw., 24, 199{207.

19 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue