Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Neurocomputingarrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
Neurocomputing
Article . 2008 . Peer-reviewed
License: Elsevier TDM
Data sources: Crossref
https://doi.org/10.1007/118408...
Part of book or chapter of book . 2006 . Peer-reviewed
Data sources: Crossref
DBLP
Conference object . 2017
Data sources: DBLP
DBLP
Article . 2017
Data sources: DBLP
versions View all 4 versions
addClaim

Natural Conjugate Gradient Training of Multilayer Perceptrons

Authors: Ana M. González; José R. Dorronsoro;

Natural Conjugate Gradient Training of Multilayer Perceptrons

Abstract

Natural gradient (NG) descent, arguably the fastest on-line method for multilayer perceptron (MLP) training, exploits the ''natural'' Riemannian metric that the Fisher information matrix defines in the MLP weight space. It also accelerates ordinary gradient descent in a batch setting but then the Fisher matrix essentially coincides with the Gauss-Newton approximation of the Hessian of the MLP square error function and NG is thus related to the Levenberg-Marquardt (LM) method, which may explain its speed-up with respect to standard gradient descent. However, even this comparison is advantageous for NG descent as it should have a linear convergence in a Riemannian weight space compared to the superlinear one of the LM method in the Euclidean weight space. This suggests that it may be interesting to consider superlinear methods for MLP training in a Riemannian setting. In this work we shall discuss how to introduce a natural conjugate gradient (CG) method for MLP training. While a fully Riemannian formulation would result in an extremely costly procedure, we shall make some simplifying assumptions that should give descent directions with properties similar to those of standard CG descent. Moreover, we will also show numerically that natural CG may lead to a faster convergence to better minima, although with a greater cost than that of standard CG that, nevertheless, may be alleviated using a diagonal natural CG variant.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    14
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
14
Top 10%
Top 10%
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!