Inverse reinforcement learning with Gaussian process

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jun 2011Embargo end date: 01 Jan 2012Publisher:IEEEJournal:Proceedings of the 2011 American Control ConferenceFunded by:NSF | Technology Based Evaluati...

Authors: Qifeng Qiao; Peter A. Beling;

doi: 10.1109/acc.2011.5990948 , 10.48550/arxiv.1208.2112

arXiv: 1208.2112

Inverse reinforcement learning with Gaussian process

- Summary
- Subjects
- Metrics

Abstract

We present new algorithms for inverse reinforcement learning (IRL, or inverse optimal control) in convex optimization settings. We argue that finite-space IRL can be posed as a convex quadratic program under a Bayesian inference framework with the objective of maximum a posterior estimation. To deal with problems in large or even infinite state space, we propose a Gaussian process model and use preference graphs to represent observations of decision trajectories. Our method is distinguished from other approaches to IRL in that it makes no assumptions about the form of the reward function and yet it retains the promise of computationally manageable implementations for potential real-world applications. In comparison with an establish algorithm on small-scale numerical problems, our method demonstrated better accuracy in apprenticeship learning and a more robust dependence on the number of observations.

conferencel American Control Conference 2011

Related Organizations

University of Virginia
United States
University of Virginia Main Campus
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average