# Full-Capacity Unitary Recurrent Neural Networks

- Published: 31 Oct 2016

- 1
- 2

[1] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157-166, 1994. [OpenAIRE]

[2] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, eds, A field guide to dynamical recurrent neural networks. IEEE Press, 2001.

[3] R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training Recurrent Neural Networks. arXiv:1211.5063, Nov. 2012. [OpenAIRE]

[4] A. M. Saxe, J. L. McClelland, and S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120, Dec. 2013.

[5] Q. V. Le, N. Jaitly, and G. E. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941, Apr. 2015.

[6] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.

[7] K. Cho, B. van Merriënboer, D. Bahdanau, and Y. Bengio. On the properties of neural machine translation: Encoder-decoder approaches. arXiv:1409.1259, 2014.

[8] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv:1512.03385, Dec. 2015.

[9] V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu. Recurrent models of visual attention. In Advances in Neural Information Processing Systems (NIPS), pp. 2204-2212, 2014. [OpenAIRE]

[10] M. Arjovsky, A. Shah, and Y. Bengio. Unitary Evolution Recurrent Neural Networks. In International Conference on Machine Learning (ICML), Jun. 2016.

[11] A. S. Householder. Unitary triangularization of a nonsymmetric matrix. Journal of the ACM, 5(4):339-342, 1958. [OpenAIRE]

[12] R. Gilmore. Lie groups, physics, and geometry: an introduction for physicists, engineers and chemists. Cambridge University Press, 2008.

[13] A. Sard. The measure of the critical values of differentiable maps. Bulletin of the American Mathematical Society, 48(12):883-890, 1942. [OpenAIRE]

[14] H. D. Tagare. Notes on optimization on Stiefel manifolds. Technical report, Yale University, 2011.

[15] T. Tieleman and G. Hinton. Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude, 2012. COURSERA: Neural Networks for Machine Learning.

- 1
- 2

##### Related research

- 1
- 2

[1] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157-166, 1994. [OpenAIRE]

[2] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, eds, A field guide to dynamical recurrent neural networks. IEEE Press, 2001.

[3] R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training Recurrent Neural Networks. arXiv:1211.5063, Nov. 2012. [OpenAIRE]

[4] A. M. Saxe, J. L. McClelland, and S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120, Dec. 2013.

[5] Q. V. Le, N. Jaitly, and G. E. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941, Apr. 2015.

[6] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.

[7] K. Cho, B. van Merriënboer, D. Bahdanau, and Y. Bengio. On the properties of neural machine translation: Encoder-decoder approaches. arXiv:1409.1259, 2014.

[8] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv:1512.03385, Dec. 2015.

[9] V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu. Recurrent models of visual attention. In Advances in Neural Information Processing Systems (NIPS), pp. 2204-2212, 2014. [OpenAIRE]

[10] M. Arjovsky, A. Shah, and Y. Bengio. Unitary Evolution Recurrent Neural Networks. In International Conference on Machine Learning (ICML), Jun. 2016.

[11] A. S. Householder. Unitary triangularization of a nonsymmetric matrix. Journal of the ACM, 5(4):339-342, 1958. [OpenAIRE]

[12] R. Gilmore. Lie groups, physics, and geometry: an introduction for physicists, engineers and chemists. Cambridge University Press, 2008.

[13] A. Sard. The measure of the critical values of differentiable maps. Bulletin of the American Mathematical Society, 48(12):883-890, 1942. [OpenAIRE]

[14] H. D. Tagare. Notes on optimization on Stiefel manifolds. Technical report, Yale University, 2011.

[15] T. Tieleman and G. Hinton. Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude, 2012. COURSERA: Neural Networks for Machine Learning.

- 1
- 2