Truncated kernel stochastic gradient descent on spheres

Name: Truncated kernel stochastic gradient descent on spheres
Keywords: Machine Learning, FOS: Computer and information sciences, 68T05, 68Q32, 33C55, 62L20, Machine Learning (cs.LG)

JinHui Bai; Lei Shi

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2024

Data sources: arXiv.org e-Print Archive

Mathematics of Computation

Article . 2025 . Peer-reviewed

License: https://www.ams.org/publications/copyright-and-permissions

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2024

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

DBLP

Article . 2024

Data sources: DBLP

Truncated kernel stochastic gradient descent on spheres

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 18 Jul 2025Embargo end date: 01 Jan 2024 English Publisher:American Mathematical Society (AMS)Journal:Mathematics of Computation (issn: 0025-5718, eissn: 1088-6842,

Copyright policy )

Authors: JinHui Bai; Lei Shi;

doi: 10.1090/mcom/4124 , 10.48550/arxiv.2410.01570

arXiv: 2410.01570

Truncated kernel stochastic gradient descent on spheres

- Summary
- Subjects
- Metrics

Abstract

Inspired by the structure of spherical harmonics, we propose the truncated kernel stochastic gradient descent (T-kernel SGD) algorithm with a least-square loss function for spherical data fitting. T-kernel SGD introduces a novel regularization strategy by implementing SGD through a closed-form solution of the projection of the stochastic gradient in a low-dimensional subspace. In contrast to traditional kernel SGD, the regularization strategy implemented by T-kernel SGD is more effective in balancing bias and variance by dynamically adjusting the hypothesis space during iterations. The most significant advantage of the proposed algorithm is that it can achieve theoretically optimal convergence rates using a constant step size (independent of the sample size) while overcoming the inherent saturation problem of kernel SGD. Additionally, we leverage the structure of spherical polynomials to derive an equivalent T-kernel SGD, significantly reducing storage and computational costs compared to kernel SGD. Typically, T-kernel SGD requires only O ( n 1 + d d − 1 ϵ ) \mathcal {O}(n^{1+\frac {d}{d-1}\epsilon }) computational complexity and O ( n d d − 1 ϵ ) \mathcal {O}(n^{\frac {d}{d-1}\epsilon }) storage to achieve optimal rates for the d-dimensional sphere, where 0 > ϵ > 1 2 0>\epsilon >\frac {1}{2} can be arbitrarily small if the optimal fitting or the underlying space possesses sufficient regularity. This regularity is determined by the smoothness parameter of the objective function and the decaying rate of the eigenvalues of the integral operator associated with the kernel function, both of which reflect the difficulty of the estimation problem. Our main results quantitatively characterize how this prior information influences the convergence of T-kernel SGD. The numerical experiments further validate the theoretical findings presented in this paper.

Related Organizations

Fudan University
China (People's Republic of)

Keywords

Machine Learning, FOS: Computer and information sciences, 68T05, 68Q32, 33C55, 62L20, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Related to Research communities

UArctic