A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization

Name: A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization
Creator: Cristian Daniel Alecsa
Keywords: FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), 0211 other engineering and technologies, FOS: Mathematics, 02 engineering and technology, Mathematics - Optimization and Control, Machine Learning (cs.LG)

Cristian Daniel Alecsa

Found an issue? Give us feedback

Journal of Global Op...arrow_drop_down

Journal of Global Optimization

Article . 2025 . Peer-reviewed

License: CC BY

Data sources: Crossref

arXiv.org e-Print Archive

Preprint . 2021

Data sources: arXiv.org e-Print Archive

https://dx.doi.org/10.48550/ar...

Article . 2021

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

DBLP

Article

Data sources: DBLP

DBLP

Article

Data sources: DBLP

A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 16 Jul 2025Embargo end date: 01 Jan 2021 English Publisher:Springer Science and Business Media LLCJournal:Journal of Global Optimization, volume 93, pages 113-173 (issn: 0925-5001, eissn: 1573-2916,

Copyright policy )

Authors: Cristian Daniel Alecsa;

doi: 10.1007/s10898-025-01518-0 , 10.48550/arxiv.2110.08531

arXiv: 2110.08531

A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization

- Summary
- Subjects
- Related research
  (4)
- Metrics

Abstract

Abstract It is known that adaptive optimization algorithms represent the key pillar behind the rise of the machine learning field. In the optimization literature numerous studies have been devoted to accelerated gradient methods but only recently adaptive iterative techniques were analyzed from a theoretical point of view. In the present paper we introduce new adaptive algorithms endowed with momentum terms for stochastic non-convex optimization problems. Our purpose is to show a deep connection between accelerated methods endowed with different inertial steps and AMSGrad-type momentum methods. Our methodology is based on the framework of stochastic and possibly non-convex objective mappings, along with some assumptions that are often used in the investigation of adaptive algorithms. In addition to discussing the finite-time horizon analysis in relation to a certain final iteration and the almost sure convergence to stationary points, we shall also look at the worst-case iteration complexity. This will be followed by an estimate for the expectation of the squared Euclidean norm of the gradient. Various computational simulations for the training of neural networks are being used to support the theoretical analysis. For future research we emphasize that there are multiple possible extensions to our work, from which we mention the investigation regarding non-smooth objective functions and the theoretical analysis of a more general formulation that encompasses our adaptive optimizers in a stochastic framework.

Related Organizations

Romanian Institute of Science and Technology
Romania
Technical University of Cluj-Napoca
Romania

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)

4 Research products, page 1 of 1

pytorch software on GitHub
IsRelatedTo
Padam software on GitHub
IsRelatedTo
AAMMSU software on GitHub
IsRelatedTo
Ranger-Deep-Learning-Optimizer software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average