A unified approach to adaptive control of average reward Markov decision processes

descriptionPublicationkeyboard_double_arrow_right Article 01 Sep 1988 English Publisher:Springer Science and Business Media LLCJournal:OR Spektrum, volume 10, pages 161-166 (issn: 0171-6468, eissn: 1436-6304,

Copyright policy )

Authors: Hübner, Gerhard;

doi: 10.1007/bf01740510

A unified approach to adaptive control of average reward Markov decision processes

- Summary
- Subjects
- Metrics

Abstract

The paper presents a general optimization method for adaptive average reward Markov decision problems. Optimal decisions are determined by applying after each observation of the state and estimation of the unknown parameter a policy improvement step to an auxiliary value function, converging with increasing time to the true relative value. This method includes the classical procedure of estimation and control [cp. \textit{M. Kurano}, J. Oper. Res. Soc. Japan 15, 67-76 (1972; Zbl 0238.90006), and \textit{P. Mandl}, Adv. Appl. Probab. 6, 40-60 (1974; Zbl 0281.60070)], the nonstationary value iteration [cp. \textit{A. Federgruen} and \textit{P. J. Schweitzer}, J. Optimization Theory Appl. 34, 207-241 (1981; Zbl 0457.90083), \textit{R. S. Acosta-Abreu} and \textit{O. Hernandez- Lerma}, Control Cybern. 14, 313-322 (1985; Zbl 0606.90130), and \textit{M. Kurano}, J. Appl. Probab. 24, 270-276 (1987)], and a lot of new procedures, too.

Related Organizations

Universität Hamburg
Germany

Keywords

nonstationary value iteration, adaptive average reward Markov decision, Markov and semi-Markov decision processes, adaptive control, policy improvement

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	5
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average