Difference of Convex Functions Programming for Reinforcement Learning

Name: Difference of Convex Functions Programming for Reinforcement Learning
Keywords: [SPI] Engineering Sciences [physics], [INFO] Computer Science [cs]

Piot, Bilal; Geist, Matthieu; Pietquin, Olivier

Found an issue? Give us feedback

INRIA2arrow_drop_down

INRIA2

Conference object . 2014

Data sources: INRIA2

SPIRE - Sciences Po Institutional REpository

Conference object . 2014

Data sources: SPIRE - Sciences Po Institutional REpository

HAL - Université de Lille

Conference object . 2014

Data sources: HAL - Université de Lille

INRIA a CCSD electronic archive server

Conference object . 2014

Data sources: INRIA a CCSD electronic archive server

Difference of Convex Functions Programming for Reinforcement Learning

descriptionPublicationkeyboard_double_arrow_right Conference object 01 Jan 2014 English Funded by:EC | ILHAIRE

Authors: Piot, Bilal; Geist, Matthieu; Pietquin, Olivier;

Difference of Convex Functions Programming for Reinforcement Learning

- Summary
- Subjects
- Metrics

Abstract

Large Markov Decision Processes are usually solved using Approximate Dy-namic Programming methods such as Approximate Value Iteration or Ap-proximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming. To do so, we study the minimization of a norm of the Optimal Bellman Residual (OBR) T * Q − Q, where T * is the so-called optimal Bellman operator. Control-ling this residual allows controlling the distance to the optimal action-value function, and we show that minimizing an empirical norm of the OBR is consistant in the Vapnik sense. Finally, we frame this optimization problem as a DC program. That allows envisioning using the large related literature on DC Programming to address the Reinforcement Leaning problem.

Related Organizations

University of Lille
France
Institut Universitaire de France
France
French Institute for Research in Computer Science and Automation
France
Institut des Sciences Humaines et Sociales
France
Sciences Po
France

View all View all

Keywords

[SPI] Engineering Sciences [physics], [INFO] Computer Science [cs]

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Funded by

EC| ILHAIRE

Related to Research communities

INRIA

The European University of Social Sciences