Empirical policy iteration for approximate dynamic programming

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Dec 2014Publisher:IEEEJournal:53rd IEEE Conference on Decision and Control

Authors: William B. Haskell 0001; Rahul Jain 0002; Dileep M. Kalathil;

doi: 10.1109/cdc.2014.7040420

Empirical policy iteration for approximate dynamic programming

- Summary
- Metrics

Abstract

We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given e > 0 and δ > 0 we specify the minimum number of simulation samples n(e, δ) needed in each iteration and the minimum number of iterations k(e, δ) that are sufficient for the EPI to yield, with a probability at least 1−δ, an approximate value function that is at least e close to the optimal value function.

Related Organizations

University of California System
United States
University of Southern California
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average