Incremental multi-step Q-learning

descriptionPublicationkeyboard_double_arrow_right Article , Part of book or chapter of book , Conference object 01 Jan 1994 English Publisher:Springer Science and Business Media LLCJournal:Machine Learning, volume 22, pages 283-290 (issn: 0885-6125, eissn: 1573-0565,

Copyright policy )

Authors: Jing Peng; Ronald J. Williams;

doi: 10.1007/bf00114731 , 10.1023/a:1018076709321 , 10.1016/b978-1-55860-335-6.50035-0

Incremental multi-step Q-learning

- Summary
- Metrics

Abstract

This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic programming-based reinforcement learning method, with the TD(A) return estimation process, which is typically used in actor-critic learning, another well-known dynamic programming-based reinforcement learning method. The parameter A is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm is demonstrated through computer simulations of the standard benchmark control problem of learning to balance a pole on a cart.

Related Organizations

University of California, Riverside
United States
Northeastern University
United States
Northwestern University
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	165
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

165

Top 1%

Average

bronze

Fields of Science (3) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all