Powered by OpenAIRE graph
Found an issue? Give us feedback
addClaim

Discrete-time equivalence for constrained semi-Markov decision processes

Authors: Frederick Beutler; Keith Ross;

Discrete-time equivalence for constrained semi-Markov decision processes

Abstract

A continuous-time average reward Markov decision process problem is most easily solved in terms of an equivalent discrete-time Markov decision process (DMDP); customary hypotheses include that the process is a Markov jump process with denumerable state space and bounded transition rates, that actions are chosen at the jump points of the process, and that the policies considered are deterministic. We derive an analogous uniformization result applicable to semi-Markov decision processes (SMDP) under a (possibly) randomized stationary policy. For each stationary policy governing an SMDP meeting certain hypotheses, we specify a past-dependent policy on a suitably constructed DMDP; the new policy carries the same average reward on the DMDP as the original policy on the SMDP. Discrete time reduction is applied to optimization on a SMDP subject to a hard constraint, for which the optimal policy has been shown to be stationary and possibly randomized at no more than a single state. Under some convexity conditions on the reward, cost, and action space, it is shown that a non-randomized policy is optimal for the constrained problem.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!