publication . Preprint . 2016

Learning to learn by gradient descent by gradient descent

Andrychowicz, Marcin; Denil, Misha; Gomez, Sergio; Hoffman, Matthew W.; Pfau, David; Schaul, Tom; Shillingford, Brendan; de Freitas, Nando;
Open Access English
  • Published: 14 Jun 2016
The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural netwo...
free text keywords: Computer Science - Neural and Evolutionary Computing, Computer Science - Learning
Download from
36 references, page 1 of 3

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1-106, 2012.

Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu. Advances in optimizing recurrent networks. In International Conference on Acoustics, Speech and Signal Processing, pages 8624-8628. IEEE, 2013. [OpenAIRE]

F. Bobolas. brain-neurons, 2009. URL 3822222947. Creative Commons Attribution-ShareAlike 2.0 Generic.

N. E. Cotter and P. R. Conwell. Fixed-weight networks can learn. In International Joint Conference on Neural Networks, pages 553-559, 1990.

C. Daniel, J. Taylor, and S. Nowozin. Learning step size controllers for robust neural network training. In Association for the Advancement of Artificial Intelligence, 2016.

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, pages 248-255. IEEE, 2009.

D. L. Donoho. Compressed sensing. Transactions on Information Theory, 52(4):1289-1306, 2006.

J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121-2159, 2011.

L. A. Feldkamp and G. V. Puskorius. A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification. Proceedings of the IEEE, 86(11):2259-2277, 1998.

L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithm of artistic style. arXiv Report 1508.06576, 2015.

A. Graves, G. Wayne, and I. Danihkela. Neural Turing machines. arXiv Report 1410.5401, 2014.

S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.

S. Hochreiter, A. S. Younger, and P. R. Conwell. Learning to learn using gradient descent. In International Conference on Artificial Neural Networks, pages 87-94. Springer, 2001. [OpenAIRE]

D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.

A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.

36 references, page 1 of 3
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue