On the detection of Markov decision processes

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 May 2025Embargo end date: 01 Jan 2021 English Publisher:Elsevier BVJournal:Automatica, volume 175, page 112,196 (issn: 0005-1098,

Copyright policy )

Authors: Xiaoming Duan; Yagiz Savas; Rui Yan 0002; Zhe Xu 0005; Ufuk Topcu;

doi: 10.1016/j.automatica.2025.112196 , 10.48550/arxiv.2112.12338

arXiv: 2112.12338

On the detection of Markov decision processes

- Summary
- Subjects
- Metrics

Abstract

We study the detection problem for a finite set of Markov decision processes (MDPs) where the MDPs have the same state and action spaces but possibly different probabilistic transition functions. Any one of these MDPs could be the model for some underlying controlled stochastic process, but it is unknown a priori which MDP is the ground truth. We investigate whether it is possible to asymptotically detect the ground truth MDP model perfectly based on a single observed history (state-action sequence). Since the generation of histories depends on the policy adopted to control the MDPs, we discuss the existence and synthesis of policies that allow for perfect detection. We start with the case of two MDPs and establish a necessary and sufficient condition for the existence of policies that lead to perfect detection. Based on this condition, we then develop an algorithm that efficiently (in time polynomial in the size of the MDPs) determines the existence of policies and synthesizes one when they exist. We further extend the results to the more general case where there are more than two MDPs in the candidate set, and we develop a policy synthesis algorithm based on the breadth-first search and recursion. We demonstrate the effectiveness of our algorithms through numerical examples.

Related Organizations

Arizona State University
United States
Shanghai Jiao Tong University
China (People's Republic of)
Beihang University
China (People's Republic of)
The University of Texas at Austin
United States

Keywords

Signal Processing (eess.SP), Markov and semi-Markov decision processes, policy synthesis, Decision theory, decision making, Markov decision processes, Optimization and Control (math.OC), FOS: Mathematics, FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Signal Processing, asymptotic detection, Mathematics - Optimization and Control, algorithm design

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green

Fields of Science

engineering and technology

other engineering and technologies

Fields of Science

engineering and technology

other engineering and technologies