Periodic Q-Learning

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2020Embargo end date: 01 Jan 2020Publisher:arXivJournal:CoRR, volume abs/2002.09795

Authors: Donghwan Lee 0002; Niao He;

doi: 10.48550/arxiv.2002.09795

arXiv: 2002.09795

Periodic Q-Learning

- Summary
- Subjects
- Metrics

Abstract

The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited. In this paper, we study the so-called periodic Q-learning algorithm (PQ-learning for short), which resembles the technique used in deep Q-learning for solving infinite-horizon discounted Markov decision processes (DMDP) in the tabular setting. PQ-learning maintains two separate Q-value estimates - the online estimate and target estimate. The online estimate follows the standard Q-learning update, while the target estimate is updated periodically. In contrast to the standard Q-learning, PQ-learning enjoys a simple finite time analysis and achieves better sample complexity for finding an epsilon-optimal policy. Our result provides a preliminary justification of the effectiveness of utilizing target estimates or networks in Q-learning algorithms.

Related Organizations

University of Illinois at Urbana–Champaign
United States
University of Illinois at Urbana Champaign
United States
University of Illinois Urbana-Champaign
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Machine Learning (stat.ML), Mathematics - Optimization and Control, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Fields of Science (4) View all

natural sciences

Fields of Science

natural sciences

View all