Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 11 Apr 2025Embargo end date: 01 Jan 2023Publisher:Association for the Advancement of Artificial Intelligence (AAAI)Journal:Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 26,605-26,613 (issn: 2159-5399, eissn: 2374-3468,

Copyright policy )

Authors: Yifan Lin; Enlu Zhou;

doi: 10.1609/aaai.v39i25.34862 , 10.48550/arxiv.2301.11415

arXiv: 2301.11415

Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

We consider infinite-horizon Markov Decision Processes where parameters, such as transition probabilities, are unknown and estimated from data. The popular distributionally robust approach to addressing the parameter uncertainty can sometimes be overly conservative. In this paper, we utilize the recently proposed formulation, Bayesian risk Markov Decision Process (BR-MDP), to address parameter (or epistemic) uncertainty in MDPs. To solve the infinite-horizon BR-MDP with a class of convex risk measures, we propose a computationally efficient approach called approximate bilevel difference convex programming (ABDCP). The optimization is performed offline and produces the optimal policy that is represented as a finite state controller with desirable performance guarantees. We also demonstrate the empirical performance of the BR-MDP formulation and the proposed algorithm.

Related Organizations

Georgia Institute of Technology
United States

Keywords

FOS: Electrical engineering, electronic engineering, information engineering, Systems and Control (eess.SY), Electrical Engineering and Systems Science - Systems and Control

1 Research products, page 1 of 1

dccp software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Fields of Science

engineering and technology

other engineering and technologies

Fields of Science

engineering and technology

other engineering and technologies

Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes

Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes

1 Research products, page 1 of 1

dccp software on GitHub