A Novel Approach for Sampling in Approximate Dynamic Programming Based on $F$ -Discrepancy

descriptionPublicationkeyboard_double_arrow_right Article 01 Oct 2017 Italy Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Cybernetics, volume 47, pages 3,355-3,366 (issn: 2168-2267, eissn: 2168-2275,

Copyright policy )

Authors: Cristiano Cervellera; Danilo Macciò;

doi: 10.1109/tcyb.2017.2660533

pmid: 28186920

handle: 20.500.14243/326458

A Novel Approach for Sampling in Approximate Dynamic Programming Based on $F$ -Discrepancy

- Summary
- Subjects
- Metrics

Abstract

Approximate dynamic programming (ADP) is the standard tool to solve Markovian decision problems under general hypotheses on the system and the cost equations. It is known that one of the key issues of the procedure is how to generate an efficient sampling of the state space, needed for the approximation of the value function, in order to cope with the well-known phenomenon of the curse of dimensionality. The most common approaches in the literature are either aimed at a uniform covering of the state space or driven by the actual evolution of the system trajectories. Concerning the latter approach, F -discrepancy, a quantity closely related to the Kolmogorov-Smirnov statistic, that measures how strictly a set of random points represents a probability distribution, has been recently proposed for an efficient ADP framework in the finite-horizon case. In this paper, we extend this framework to infinite-horizon discounted problems, providing a constructive algorithm to generate efficient sampling points driven by the system behavior. Then, the algorithm is refined with the aim of acquiring a more balanced covering of the state space, thus addressing possible drawbacks of a pure system-driven sampling approach to obtain, in fact, an efficient hybrid between the latter and the pure uniform design. A theoretical analysis is provided through the introduction of an original notion of the F -discrepancy and the proof of its properties. Simulation tests are provided to showcase the behavior of the proposed sampling method.

Country

Italy

Related Organizations

National Academies of Sciences, Engineering, and Medicine
United States
Institute of Marine Engineering
Italy
National Research Council
Italy
National Research Council
Sri Lanka

Keywords

F -discrepancy, state sampling, approximate dynamic programming, Markovian decision problem

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	6
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average