Discrete-time equivalence for constrained semi-Markov decision processes

A continuous-time average reward Markov decision process problem is most easily solved in terms of an equivalent discrete-time Markov decision process (DMDP); customary hypotheses include that the process is a Markov jump process with denumerable state space and bounded transition rates, that actions are chosen at the jump points of the process, and that the policies considered are deterministic. We derive an analogous uniformization result applicable to semi-Markov decision processes (SMDP) under a (possibly) randomized stationary policy. For each stationary policy governing an SMDP meeting certain hypotheses, we specify a past-dependent policy on a suitably constructed DMDP; the new policy carries the same average reward on the DMDP as the original policy on the SMDP. Discrete time reduction is applied to optimization on a SMDP subject to a hard constraint, for which the optimal policy has been shown to be stationary and possibly randomized at no more than a single state. Under some convexity conditions on the reward, cost, and action space, it is shown that a non-randomized policy is optimal for the constrained problem.

Related Organizations

University of Michigan–Ann Arbor
United States
University of Pennsylvania
United States
University of Michigan–Flint
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average