F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1-106, 2012.
Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu. Advances in optimizing recurrent networks. In International Conference on Acoustics, Speech and Signal Processing, pages 8624-8628. IEEE, 2013. [OpenAIRE]
F. Bobolas. brain-neurons, 2009. URL https://www.flickr.com/photos/fbobolas/ 3822222947. Creative Commons Attribution-ShareAlike 2.0 Generic.
N. E. Cotter and P. R. Conwell. Fixed-weight networks can learn. In International Joint Conference on Neural Networks, pages 553-559, 1990.
C. Daniel, J. Taylor, and S. Nowozin. Learning step size controllers for robust neural network training. In Association for the Advancement of Artificial Intelligence, 2016.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, pages 248-255. IEEE, 2009.
D. L. Donoho. Compressed sensing. Transactions on Information Theory, 52(4):1289-1306, 2006.
J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121-2159, 2011.
L. A. Feldkamp and G. V. Puskorius. A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification. Proceedings of the IEEE, 86(11):2259-2277, 1998.
L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithm of artistic style. arXiv Report 1508.06576, 2015.
A. Graves, G. Wayne, and I. Danihkela. Neural Turing machines. arXiv Report 1410.5401, 2014.
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.
S. Hochreiter, A. S. Younger, and P. R. Conwell. Learning to learn using gradient descent. In International Conference on Artificial Neural Networks, pages 87-94. Springer, 2001. [OpenAIRE]
D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.