publication . Preprint . 2016

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

Salimans, Tim; Kingma, Diederik P.;
Open Access English
  • Published: 25 Feb 2016
Abstract
We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch nor...
Subjects
free text keywords: Computer Science - Learning, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing
Download from
24 references, page 1 of 2

Amari, S. Neural learning in structured parameter spaces - natural Riemannian gradient. In Advances in Neural Information Processing Systems, pp. 127-133. MIT Press, 1997.

Bastien, Fre´de´ric, Lamblin, Pascal, Pascanu, Razvan, Bergstra, James, Goodfellow, Ian J., Bergeron, Arnaud, Bouchard, Nicolas, and Bengio, Yoshua. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.

Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253-279, 06 2013. [OpenAIRE]

Glorot, Xavier and Bengio, Yoshua. Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics, pp. 249-256, 2010.

Goodfellow, Ian, Bengio, Yoshua, and Courville, Aaron. Deep learning. Book in preparation for MIT Press, 2016.

Goodfellow, Ian J, Warde-Farley, David, Mirza, Mehdi, Courville, Aaron, and Bengio, Yoshua. Maxout networks. In ICML, 2013.

Gregor, Karol, Danihelka, Ivo, Graves, Alex, and Wierstra, Daan. Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623, 2015. [OpenAIRE]

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.

Hochreiter, Sepp and Schmidhuber, Ju¨rgen. Long shortterm memory. Neural computation, 9(8):1735-1780, 1997.

Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.

Kingma, Diederik and Ba, Jimmy. method for stochastic optimization. arXiv:1412.6980, 2014.

Kingma, Diederik P and Welling, Max. Auto-Encoding Variational Bayes. Proceedings of the 2nd International Conference on Learning Representations, 2013.

Kra¨henbu¨hl, Philipp, Doersch, Carl, Donahue, Jeff, and Darrell, Trevor. Data-dependent initializations of convolutional neural networks. arXiv preprint arXiv:1511.06856, 2015.

Krizhevsky, Alex and Hinton, Geoffrey. Learning multiple layers of features from tiny images, 2009.

Lee, Chen-Yu, Xie, Saining, Gallagher, Patrick, Zhang, Zhengyou, and Tu, Zhuowen. Deeply-supervised nets. In Deep Learning and Representation Learning Workshop, NIPS, 2014. [OpenAIRE]

24 references, page 1 of 2
Abstract
We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch nor...
Subjects
free text keywords: Computer Science - Learning, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing
Download from
24 references, page 1 of 2

Amari, S. Neural learning in structured parameter spaces - natural Riemannian gradient. In Advances in Neural Information Processing Systems, pp. 127-133. MIT Press, 1997.

Bastien, Fre´de´ric, Lamblin, Pascal, Pascanu, Razvan, Bergstra, James, Goodfellow, Ian J., Bergeron, Arnaud, Bouchard, Nicolas, and Bengio, Yoshua. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.

Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253-279, 06 2013. [OpenAIRE]

Glorot, Xavier and Bengio, Yoshua. Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics, pp. 249-256, 2010.

Goodfellow, Ian, Bengio, Yoshua, and Courville, Aaron. Deep learning. Book in preparation for MIT Press, 2016.

Goodfellow, Ian J, Warde-Farley, David, Mirza, Mehdi, Courville, Aaron, and Bengio, Yoshua. Maxout networks. In ICML, 2013.

Gregor, Karol, Danihelka, Ivo, Graves, Alex, and Wierstra, Daan. Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623, 2015. [OpenAIRE]

He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.

Hochreiter, Sepp and Schmidhuber, Ju¨rgen. Long shortterm memory. Neural computation, 9(8):1735-1780, 1997.

Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.

Kingma, Diederik and Ba, Jimmy. method for stochastic optimization. arXiv:1412.6980, 2014.

Kingma, Diederik P and Welling, Max. Auto-Encoding Variational Bayes. Proceedings of the 2nd International Conference on Learning Representations, 2013.

Kra¨henbu¨hl, Philipp, Doersch, Carl, Donahue, Jeff, and Darrell, Trevor. Data-dependent initializations of convolutional neural networks. arXiv preprint arXiv:1511.06856, 2015.

Krizhevsky, Alex and Hinton, Geoffrey. Learning multiple layers of features from tiny images, 2009.

Lee, Chen-Yu, Xie, Saining, Gallagher, Patrick, Zhang, Zhengyou, and Tu, Zhuowen. Deeply-supervised nets. In Deep Learning and Representation Learning Workshop, NIPS, 2014. [OpenAIRE]

24 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue