Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 23 Apr 2024Embargo end date: 01 Jan 2024 English Publisher:MIT PressJournal:Neural Computation, volume 36, pages 897-935 (issn: 0899-7667, eissn: 1530-888X,

Copyright policy )

Authors: Gu, Bin; Wei, Xiyuan; Zhang, Hualin; Chang, Yi; Huang, Heng;

doi: 10.1162/neco_a_01636 , 10.48550/arxiv.2410.02559

pmid: 38457756

arXiv: 2410.02559

Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms

- Summary
- Subjects
- Metrics

Abstract

Abstract Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.

Related Organizations

The University of Texas System
United States
University of Maryland, College Park
United States
Mohamed bin Zayed University of Artificial Intelligence
United Arab Emirates
Jilin University
China (People's Republic of)

Keywords

FOS: Computer and information sciences, Numerical optimization and variational techniques, Computer Science - Machine Learning, Optimization and Control (math.OC), Methods of reduced gradient type, Learning and adaptive systems in artificial intelligence, FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Fields of Science (4) View all

Fields of Science