A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization

descriptionPublicationkeyboard_double_arrow_right Article 01 Oct 2022 Singapore Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 44, pages 5,933-5,946 (issn: 0162-8828, eissn: 1939-3539,

Copyright policy )

Authors: Pan Zhou 0002; Xiao-Tong Yuan; Zhouchen Lin; Steven C. H. Hoi;

doi: 10.1109/tpami.2021.3087328

pmid: 34101583

A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient~(HSDMPG) algorithm for strongly convex problems with linear prediction structure, e.g.~least squares and logistic/softmax regression. HSDMPG~enjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolving~minibatch of individual losses to estimate the original problem, and efficiently minimizes the sampled smaller-sized subproblems. For strongly convex loss of n components, HSDMPG~attains an ϵ-optimization-error within [Formula: see text] stochastic gradient evaluations, where κ is condition number, ζ = 1 for quadratic loss and ζ = 2 for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when ϵ = O(1/√n) which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively O (n0.5log2(n)) and O (n0.5log3(n)), which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPG~to online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of~HSDMPG.

Country

Singapore

Related Organizations

Singapore Management University
Singapore
Peking University
China (People's Republic of)
Peking University
China (People's Republic of)
Salesforce.com
Australia
Nanjing University of Information Science and Technology
China (People's Republic of)

Keywords

Precondition, Artificial Intelligence and Robotics, Convex Optimization, Theory and Algorithms, Stochastic Variance-Reduced Algorithm, Online Convex Optimization

1 Research products, page 1 of 1

$\zeta^2$ Ret, its debris disk, and its lonely stellar companion $\zeta^1$ Ret. Different $T_{\mathrm{c}}$ trends for different spectra
2016IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

3

Top 10%

Average

Green

Fields of Science (4) View all

natural sciences

Fields of Science

natural sciences

View all

A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization

A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization

1 Research products, page 1 of 1

$\zeta^2$ Ret, its debris disk, and its lonely stellar companion $\zeta^1$ Ret. Different $T_{\mathrm{c}}$ trends for different spectra