Uniformization for semi-Markov decision processes under stationary policies

descriptionPublicationkeyboard_double_arrow_right Article 01 Sep 1987 English Publisher:Cambridge University Press (CUP)Journal:Journal of Applied Probability, volume 24, pages 644-656 (issn: 0021-9002, eissn: 1475-6072,

Copyright policy )

Authors: Beutler, Frederick J.; Ross, Keith W.;

doi: 10.2307/3214096 , 10.1017/s0021900200031375

Uniformization for semi-Markov decision processes under stationary policies

- Summary
- Subjects
- Metrics

Abstract

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies.We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.

Related Organizations

University of Michigan–Flint
United States
University of Pennsylvania
United States
University of Michigan–Ann Arbor
United States

Keywords

Markov renewal processes, semi-Markov processes, Markov and semi-Markov decision processes, stationary processes, Dynamic programming, semi-Markov decision process, optimal policy computations, optimal constrained policies

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	23
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average