Practical Kernel-Based Reinforcement Learning

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2014Embargo end date: 01 Jan 2014Publisher:arXivFunded by:NSERC | unidentified, NIH | Methodology for Adaptive ...

Authors: Barreto, André M. S.; Precup, Doina; Pineau, Joelle;

doi: 10.48550/arxiv.1407.5358

arXiv: 1407.5358

Practical Kernel-Based Reinforcement Learning

- Summary
- Subjects
- Metrics

Abstract

Kernel-based reinforcement learning (KBRL) stands out among reinforcement learning algorithms for its strong theoretical guarantees. By casting the learning problem as a local kernel approximation, KBRL provides a way of computing a decision policy which is statistically consistent and converges to a unique solution. Unfortunately, the model constructed by KBRL grows with the number of sample transitions, resulting in a computational cost that precludes its application to large-scale or on-line domains. In this paper we introduce an algorithm that turns KBRL into a practical reinforcement learning tool. Kernel-based stochastic factorization (KBSF) builds on a simple idea: when a transition matrix is represented as the product of two stochastic matrices, one can swap the factors of the multiplication to obtain another transition matrix, potentially much smaller, which retains some fundamental properties of its precursor. KBSF exploits such an insight to compress the information contained in KBRL's model into an approximator of fixed size. This makes it possible to build an approximation that takes into account both the difficulty of the problem and the associated computational cost. KBSF's computational complexity is linear in the number of sample transitions, which is the best one can do without discarding data. Moreover, the algorithm's simple mechanics allow for a fully incremental implementation that makes the amount of memory used independent of the number of sample transitions. The result is a kernel-based reinforcement learning algorithm that can be applied to large-scale problems in both off-line and on-line regimes. We derive upper bounds for the distance between the value functions computed by KBRL and KBSF using the same data. We also illustrate the potential of our algorithm in an extensive empirical study in which KBSF is applied to difficult tasks based on real-world data.

Keywords

FOS: Computer and information sciences, 68T05 (Primary), 93E35, 90C40, 93E20, 49L20 (Secondary), Computer Science - Machine Learning, I.2.8; I.2.6; G.3, Computer Science - Artificial Intelligence, I.2.6, I.2.8, G.3, Machine Learning (stat.ML), Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Statistics - Machine Learning

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

Fields of Science (3) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all

Funded by

NSERC| unidentified, NIH| Methodology for Adaptive Treatment Strategies (RMI)