Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ University of St. Ga...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Doctoral thesis . 2024
License: CC BY
Data sources: ZENODO
ZENODO
Thesis . 2024
License: CC BY
Data sources: Datacite
ZENODO
Thesis . 2024
License: CC BY
Data sources: Datacite
DBLP
Doctoral thesis . 2025
Data sources: DBLP
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
versions View all 4 versions
addClaim

Investigating a Second-Order Optimization Strategy for Neural Networks

Authors: Bernhard Bermeitinger;

Investigating a Second-Order Optimization Strategy for Neural Networks

Abstract

Zusammenfassend untersucht die vorliegende kumulative Dissertation die Anwendung des konjugierten Gradienten (CG) zur Optimierung künstlicher neuronaler Netzwerke (NNs) und vergleicht diese Methode mit verbreiteten Optimierungsverfahren erster Ordnung, insbesondere dem Stochastischem Gradientenabstieg (SGD). Die in den Arbeiten präsentierten Forschungsergebnisse zeigen, dass CG in der Lage ist, sowohl kleinere als auch sehr große Netzwerke effektiv zu optimieren. Allerdings kann die Maschinen- genauigkeit bei 32-Bit-Berechnungen zu Problemen führen, beste Ergebnisse werden erst in 64-Bit-Fließkommazahlen erreicht. Die Forschung betont auch die Bedeutung der Initialisierung der NN-Parameter und zeigt, dass eine Initialisierung mittels Singulärwertzerlegung zu deutlich geringeren Fehlerwerten führt. Überraschenderweise erzielen flachere NNs bessere Ergebnisse als tiefe NNs mit einer vergleichbaren Anzahl an trainierbaren Parametern, unabhängig vom jeweiligen NN, das die künstlichen Daten erzeugt. Es zeigt sich auch, dass flache, breite NNs, sowohl in Transformer-, als auch in CNN-Architekturen oft besser abschneiden als ihre tieferen Gegenstücke. Insgesamt empfehlen die Forschungsergebnisse eine Neubewertung der bisherigen Präferenz für extrem tiefe NNs und betonen das Potential von CG als Optimierungsmethode.

In summary, this cumulative dissertation investigates the application of the conjugate gradient method CG for the optimization of artificial neural networks (NNs) and compares this method with common first-order optimization methods, especially the stochastic gradient descent (SGD). The presented research results show that CG can effectively optimize both small and very large networks. However, the default machine precision of 32 bits can lead to problems. The best results are only achieved in 64-bits computations. The research also emphasizes the importance of the initialization of the NNs’ trainable parameters and shows that an initialization using singular value decomposition (SVD) leads to drastically lower error values. Surprisingly, shallow but wide NNs, both in Transformer and CNN architectures, often perform better than their deeper counterparts. Overall, the research results recommend a re-evaluation of the previous preference for extremely deep NNs and emphasize the potential of CG as an optimization method.

Countries
Germany, Switzerland
Related Organizations
Keywords

ddc:004, ddc:510, 551, 530

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green