publication . Article . 2009

SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

Bordes, Antoine; Bottou, Léon; Gallinari, Patrick;
Open Access English
  • Published: 01 Jul 2009
  • Publisher: HAL CCSD
  • Country: France
Abstract
International audience; The SGD-QN algorithm is a stochastic gradient descent algorithm that makes careful use of second-order information and splits the parameter update into independently scheduled components. Thanks to this design, SGD-QN iterates nearly as fast as a first-order stochastic gradient descent but requires less iterations to achieve the same accuracy. This algorithm won the "Wild Track" of the first PASCAL Large Scale Learning Challenge (Sonnenburg et al., 2008).
Subjects
ACM Computing Classification System: MathematicsofComputing_NUMERICALANALYSIS
free text keywords: [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
Funded by
NSF| ITR: Collaborative Research: New Directions in Predictive Learning: Rigorous Learning Machines
Project
  • Funder: National Science Foundation (NSF)
  • Project Code: 0325463
  • Funding stream: Directorate for Computer & Information Science & Engineering | Division of Computing and Communication Foundations
18 references, page 1 of 2

S.-I. Amari, H. Park, and K. Fukumizu. Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Computation, 12:1409, 2000.

S. Becker and Y. Le Cun. Improving the convergence of back-propagation learning with secondorder methods. In Proc. 1988 Connectionist Models Summer School, pages 29-37. Morgan Kaufman, 1989.

A. Bordes, S. Ertekin, J. Weston, and L. Bottou. Fast kernel classifiers with online and active learning. J. Machine Learning Research, 6:1579-1619, September 2005.

L. Bottou. Online algorithms and stochastic approximations. In David Saad, editor, Online Learning and Neural Networks. Cambridge University Press, Cambridge, UK, 1998.

L. Bottou. Stochastic gradient descent on toy problems, 2007. http://leon.bottou.org/ projects/sgd.

L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Adv. in Neural Information Processing Systems, volume 20. MIT Press, 2008.

L. Bottou and Y. LeCun. On-line learning for very large datasets. Applied Stochastic Models in Business and Industry, 21(2):137-151, 2005. [OpenAIRE]

X. Driancourt. Optimisation par descente de gradient stochastique de systèmes modulaires combinant réseaux de neurones et programmation dynamique. PhD thesis, Université Paris XI, Orsay, France, 1994.

V. Fabian. Asymptotically efficient stochastic approximation; the RM case. Annals of Statistics, 1 (3):486-495, 1973. [OpenAIRE]

C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Proc. 25th Intl. Conf. on Machine Learning (ICML'08), pages 408-415. Omnipress, 2008.

Y. Le Cun, L. Bottou, G. Orr, and K.-R. Müller. Efficient backprop. In Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science LNCS 1524. Springer Verlag, 1998.

D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. J. Machine Learning Research, 5:361-397, 2004.

N. Murata and S.-I. Amari. Statistical analysis of learning dynamics. Signal Processing, 74(1): 3-28, 1999.

J. Nocedal. Updating quasi-Newton matrices with limited storage. Mathematics of Computation, 35:773-782, 1980. [OpenAIRE]

N. Schraudolph. Local gain adaptation in stochastic gradient descent. In In Proc. of the 9th Intl. Conf. on Artificial Neural Networks, pages 569-574, 1999.

18 references, page 1 of 2
Any information missing or wrong?Report an Issue