publication . Preprint . 2018

Stable Recurrent Models

Miller, John; Hardt, Moritz;
Open Access English
  • Published: 25 May 2018
Abstract
Stability is a fundamental property of dynamical systems, yet to this date it has had little bearing on the practice of recurrent neural networks. In this work, we conduct a thorough investigation of stable recurrent models. Theoretically, we prove stable recurrent neural networks are well approximated by feed-forward networks for the purpose of both inference and training by gradient descent. Empirically, we demonstrate stable recurrent models often perform as well as their unstable counterparts on benchmark sequence tasks. Taken together, these findings shed light on the effective power of recurrent networks and suggest much of sequence learning happens, or ca...
Subjects
free text keywords: Computer Science - Machine Learning, Statistics - Machine Learning
Download from

[1] Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018. [OpenAIRE]

[2] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term dependencies with gradient descent is di cult. IEEE transactions on neural networks, 5(2):157{166, 1994. [OpenAIRE]

[3] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scienti c, 1999.

[4] Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. Language modeling with gated convolutional networks. In International Conference on Machine Learning, pages 933{ 941, 2017.

[5] Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. In International Conference on Machine Learning, pages 1243{1252, 2017.

[6] Moritz Hardt, Tengyu Ma, and Benjamin Recht. Gradient descent learns linear dynamical systems. arXiv preprint arXiv:1609.05191, 2016. [OpenAIRE]

[7] Moritz Hardt, Benjamin Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. In International Conference on Machine Learning, pages 1225{ 1234, 2016.

[8] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735{1780, 1997.

[9] Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016. [OpenAIRE]

[10] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the di culty of training recurrent neural networks. In International Conference on Machine Learning, pages 1310{1318, 2013.

[11] Hanie Sedghi and Anima Anandkumar. Training input-output recurrent neural networks through spectral methods. CoRR, abs/1603.00954, 2016. [OpenAIRE]

[12] Stephen Tu, Ross Boczar, Andrew Packard, and Benjamin Recht. Non-asymptotic analysis of robust control from coarse-grained identi cation. arXiv preprint arXiv:1707.04791, 2017. [OpenAIRE]

[13] Aaron Van Den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.

[14] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000{6010, 2017.

= diag(tanh0(W h + U x))W

Abstract
Stability is a fundamental property of dynamical systems, yet to this date it has had little bearing on the practice of recurrent neural networks. In this work, we conduct a thorough investigation of stable recurrent models. Theoretically, we prove stable recurrent neural networks are well approximated by feed-forward networks for the purpose of both inference and training by gradient descent. Empirically, we demonstrate stable recurrent models often perform as well as their unstable counterparts on benchmark sequence tasks. Taken together, these findings shed light on the effective power of recurrent networks and suggest much of sequence learning happens, or ca...
Subjects
free text keywords: Computer Science - Machine Learning, Statistics - Machine Learning
Download from

[1] Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018. [OpenAIRE]

[2] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term dependencies with gradient descent is di cult. IEEE transactions on neural networks, 5(2):157{166, 1994. [OpenAIRE]

[3] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scienti c, 1999.

[4] Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. Language modeling with gated convolutional networks. In International Conference on Machine Learning, pages 933{ 941, 2017.

[5] Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. In International Conference on Machine Learning, pages 1243{1252, 2017.

[6] Moritz Hardt, Tengyu Ma, and Benjamin Recht. Gradient descent learns linear dynamical systems. arXiv preprint arXiv:1609.05191, 2016. [OpenAIRE]

[7] Moritz Hardt, Benjamin Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. In International Conference on Machine Learning, pages 1225{ 1234, 2016.

[8] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735{1780, 1997.

[9] Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016. [OpenAIRE]

[10] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the di culty of training recurrent neural networks. In International Conference on Machine Learning, pages 1310{1318, 2013.

[11] Hanie Sedghi and Anima Anandkumar. Training input-output recurrent neural networks through spectral methods. CoRR, abs/1603.00954, 2016. [OpenAIRE]

[12] Stephen Tu, Ross Boczar, Andrew Packard, and Benjamin Recht. Non-asymptotic analysis of robust control from coarse-grained identi cation. arXiv preprint arXiv:1707.04791, 2017. [OpenAIRE]

[13] Aaron Van Den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.

[14] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000{6010, 2017.

= diag(tanh0(W h + U x))W

Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue