Approximate Policy Iteration for Markov Control Revisited

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jan 2012 English Publisher:Elsevier BVJournal:Procedia Computer Science, volume 12, pages 90-95 (issn: 1877-0509,

Copyright policy )

Authors: Abhijit Gosavi;

doi: 10.1016/j.procs.2012.09.036

Approximate Policy Iteration for Markov Control Revisited

- Summary
- Subjects
- Metrics

Abstract

AbstractQ-Learning is based on value iteration and remains the most popular choice for solving Markov Decision Problems (MDPs) via reinforcement learning (RL), where the goal is to bypass the transition probabilities of the MDP. Approximate policy iteration (API) is another RL technique, not as widely used as Q-Learning, based on modified policy iteration. In this paper, we present and analyze an API algorithm for discounted reward based on (i) a classical temporal differences update for policy evaluation and (ii) simulation-based mean estimation for policy improvement. Further, we analyze for convergence API algorithms based on Q-factors for (i) discounted reward and (ii) for average reward MDPs. The average reward algorithm is based on relative value iteration; we also present results from some numerical experiments with it.

Related Organizations

Missouri University of Science and Technology
United States

Keywords

average reward, Q-P-Learning, Approximate policy iteration, relative value iteration

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Average

gold

Fields of Science (3) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all