Approximate value iteration with randomized policies

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 11 Nov 2002Publisher:IEEEJournal:Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187), volume 4, pages 3,421-3,426

Authors: Daniela Pucci de Farias; Benjamin Van Roy;

doi: 10.1109/cdc.2000.912232

Approximate value iteration with randomized policies

- Summary
- Metrics

Abstract

The curse of dimensionality in dynamic programming prevents, in most problems of practical interest, the exact computation of the value function. We study the fixed points of approximate value iteration, a simple algorithm that combats the curse of dimensionality by generating approximate iterates of the classical value iteration algorithm in the span of a set of prespecified basis functions. We show that, in general, the modified dynamic programming operator need not possess a fixed point, and therefore, approximate value iteration should not be expected to converge. However, by using a class of randomized policies, approximate value iteration is guaranteed to possess at least one fixed point. We finally discuss the link between approximate value iteration and temporal-difference learning (TD), and show that the existence of fixed points for approximate value iteration implies existence of stationary points for the ordinary differential equation approximated by a version of TD that incorporates "exploration".

Related Organizations

Stanford University
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now