Large deviations rates for stochastic gradient descent with strongly convex functions

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2022Embargo end date: 01 Jan 2022 English Publisher:ZenodoJournal:CoRR, volume abs/2211.00969Funded by:EC | MARVEL

Authors: Dragana Bajovic; Dusan Jakovetic; Soummya Kar;

doi: 10.5281/zenodo.7537625 , 10.5281/zenodo.7537624 , 10.48550/arxiv.2211.00969

arXiv: 2211.00969

Large deviations rates for stochastic gradient descent with strongly convex functions

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Recent works have shown that high probability metrics with stochastic gradient descent (SGD) exhibit informativeness and in some cases advantage over the commonly adopted mean-square error-based ones. In this work we provide a formal framework for the study of general high probability bounds with SGD, based on the theory of large deviations. The framework allows for a generic (not-necessarily bounded) gradient noise satisfying mild technical assumptions, allowing for the dependence of the noise distribution on the current iterate. Under the preceding assumptions, we find an upper large deviations bound for SGD with strongly convex functions. The corresponding rate function captures analytical dependence on the noise distribution and other problem parameters. This is in contrast with conventional mean-square error analysis that captures only the noise dependence through the variance and does not capture the effect of higher order moments nor interplay between the noise geometry and the shape of the cost function. We also derive exact large deviation rates for the case when the objective function is quadratic and show that the obtained function matches the one from the general upper bound hence showing the tightness of the general upper bound. Numerical examples illustrate and corroborate theoretical findings.

32 pages, 2 figures

Related Organizations

Carnegie Mellon University
United States
University of Novi Sad, Faculty of Technical Sciences
Serbia
Faculty of Sciences
Serbia
University of Novi Sad, Faculty of Sciences
Serbia
University of Novi Sad
Serbia

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, convex functions, Computer Science - Information Theory, Information Theory (cs.IT), Machine Learning (stat.ML), large deviations, Machine Learning (cs.LG), Statistics - Machine Learning, Optimization and Control (math.OC), stochastic gradient descent, FOS: Mathematics, Mathematics - Optimization and Control

1 Research products, page 1 of 1

Large deviations rates for stochastic gradient descent with strongly convex functions
2023HasVersion

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average