Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 26 Jun 2023Embargo end date: 01 Jan 2022Publisher:Association for the Advancement of Artificial Intelligence (AAAI)Journal:Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 11,069-11,077 (issn: 2159-5399, eissn: 2374-3468,

Copyright policy )

Authors: Yang Yue; Bingyi Kang; Zhongwen Xu; Gao Huang 0001; Shuicheng Yan;

doi: 10.1609/aaai.v37i9.26311 , 10.48550/arxiv.2206.12542

arXiv: 2206.12542

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

Deep reinforcement learning (RL) algorithms suffer severe performance degradation when the interaction data is scarce, which limits their real-world application. Recently, visual representation learning has been shown to be effective and promising for boosting sample efficiency in RL. These methods usually rely on contrastive learning and data augmentation to train a transition model, which is different from how the model is used in RL---performing value-based planning. Accordingly, the learned representation by these visual methods may be good for recognition but not optimal for estimating state value and solving the decision problem. To address this issue, we propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making. More specifically, VCR trains a model to predict the future state (also referred to as the "imagined state'') based on the current one and a sequence of actions. Instead of aligning this imagined state with a real state returned by the environment, VCR applies a Q value head on both of the states and obtains two distributions of action values. Then a distance is computed and minimized to force the imagined state to produce a similar action value prediction as that by the real state. We develop two implementations of the above idea for the discrete and continuous action spaces respectively. We conduct experiments on Atari 100k and DeepMind Control Suite benchmarks to validate their effectiveness for improving sample efficiency. It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.

Related Organizations

Tsinghua University
Tsinghua University
China (People's Republic of)
Tsinghua University
Tsinghua University
Tsinghua University

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)

2 Research products, page 1 of 1

Playvirtual software on GitHub
IsRelatedTo
spr software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average