
Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies.We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.
Markov renewal processes, semi-Markov processes, Markov and semi-Markov decision processes, stationary processes, Dynamic programming, semi-Markov decision process, optimal policy computations, optimal constrained policies
Markov renewal processes, semi-Markov processes, Markov and semi-Markov decision processes, stationary processes, Dynamic programming, semi-Markov decision process, optimal policy computations, optimal constrained policies
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 23 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
