Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2020Embargo end date: 01 Jan 2020Publisher:arXivJournal:CoRR, volume abs/2004.00530

Authors: Zhuangdi Zhu; Kaixiang Lin; Bo Dai 0001; Jiayu Zhou;

doi: 10.48550/arxiv.2004.00530

arXiv: 2004.00530

Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Model-free deep reinforcement learning (RL) has demonstrated its superiority on many complex sequential decision-making problems. However, heavy dependence on dense rewards and high sample-complexity impedes the wide adoption of these methods in real-world scenarios. On the other hand, imitation learning (IL) learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations. In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive, and the quality of demonstrations typically limits the performance of the learning policy. In this work, we propose Self-Adaptive Imitation Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations for highly challenging sparse reward tasks. SAIL bridges the advantages of IL and RL to reduce the sample complexity substantially, by effectively exploiting sup-optimal demonstrations and efficiently exploring the environment to surpass the demonstrated performance. Extensive empirical results show that not only does SAIL significantly improve the sample-efficiency but also leads to much better final performance across different continuous control tasks, comparing to the state-of-the-art.

Related Organizations

Michigan State University
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Robotics, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, Machine Learning (stat.ML), Robotics (cs.RO), Machine Learning (cs.LG)

1 Research products, page 1 of 1

mujoco-py software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Fields of Science (4) View all

natural sciences

Fields of Science

natural sciences

View all

Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations

Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations

1 Research products, page 1 of 1

mujoco-py software on GitHub