Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 17 Aug 2025Embargo end date: 01 Jan 2022Publisher:AI Access FoundationJournal:Journal of Artificial Intelligence Research, volume 83 (eissn: 1076-9757,

Copyright policy )

Authors: Donghao Ying; Mengzi Amy Guo; Hyunin Lee; Yuhao Ding; Javad Lavaei; Zuo-Jun Max Shen;

doi: 10.1613/jair.1.18129 , 10.48550/arxiv.2205.10715

arXiv: 2205.10715

Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction

- Summary
- Subjects
- Metrics

Abstract

We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the challenges posed by the loss of additivity structure and the nonconcave nature of the problem, we establish the global convergence of VR-PDPG by exploiting a form of hidden concavity. In the exact setting, we prove an O(T-1/3) convergence rate for both the average optimality gap and constraint violation, which further improves to O(T-1/2) under strong concavity of the objective in the occupancy measure. In the sample-based setting, we demonstrate that VR-PDPG achieves an O(ε-4) sample complexity for ε-global optimality. Moreover, by incorporating a diminishing pessimistic term into the constraint, we show that VR-PDPG can attain a zero constraint violation without compromising the convergence rate of the optimality gap. Finally, we validate our methods through numerical experiments.

Related Organizations

University of California, Berkeley
United States
UC Berkeley
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

gold

Fields of Science (4) View all

natural sciences

Fields of Science

natural sciences

View all