Regularized Q-Learning With Linear Function Approximation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2026Embargo end date: 01 Jan 2024Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Automatic Control, volume 71, pages 504-511 (issn: 0018-9286, eissn: 2334-3303,

Copyright policy )

Authors: Jiachen Xi; Alfredo Garcia; Petar Momčilović;

doi: 10.1109/tac.2025.3592801 , 10.48550/arxiv.2401.15196

arXiv: 2401.15196

Regularized Q-Learning With Linear Function Approximation

- Summary
- Subjects
- Metrics

Abstract

Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation, the convergence properties of learning algorithms for regularized MDPs (e.g. soft Q-learning) are not well understood because the composition of the regularized Bellman operator and a projection onto the span of basis vectors is not a contraction with respect to any norm. In this paper, we consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation. The {\em lower} level optimization problem aims to identify a value function approximation that satisfies Bellman's recursive optimality condition and the {\em upper} level aims to find the projection onto the span of basis vectors. This formulation motivates a single-loop algorithm with finite time convergence guarantees. The algorithm operates on two time-scales: updates to the projection of state-action values are `slow' in that they are implemented with a step size that is smaller than the one used for `faster' updates of approximate solutions to Bellman's recursive optimality equation. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise. In addition, we provide a performance guarantee for the policies derived from the proposed algorithm.

Related Organizations

Texas A&M University
United States
Texas A&M University
United States
The University of Texas System
United States
Texas A&M University
United States

Keywords

FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Fields of Science (3) View all

engineering and technology

mechanical engineering

Fields of Science

engineering and technology

mechanical engineering

View all