publication . Preprint . 2016

Asynchronous Methods for Deep Reinforcement Learning

Mnih, Volodymyr; Badia, Adrià Puigdomènech; Mirza, Mehdi; Graves, Alex; Lillicrap, Timothy P.; Harley, Tim; Silver, David; Kavukcuoglu, Koray;
Open Access English
  • Published: 04 Feb 2016
We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor...
free text keywords: Computer Science - Learning
Download from
28 references, page 1 of 2

Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Arti cial Intelligence Research, 2012.

Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, and Remi Munos. Increasing the action gap: New operators for reinforcement learning. In Proceedings of the AAAI Conference on Arti cial Intelligence, 2016.

Dimitri P Bertsekas. Distributed dynamic programming. Automatic Control, IEEE Transactions on, 27(3):610{616, 1982.

Kevin Chavez, Hao Yi Ong, and Augustus Hong. Distributed deep q-learning. Technical report, Stanford University, June 2015.

Thomas Degris, Patrick M Pilarski, and Richard S Sutton. Model-free reinforcement learning with continuous action in practice. In American Control Conference (ACC), 2012, pages 2177{2182. IEEE, 2012. [OpenAIRE]

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12: 2121{2159, 2011.

Matthew Grounds and Daniel Kudenko. Parallel reinforcement learning with linear function approximation. In Proceedings of the 5th, 6th and 7th European Conference on Adaptive and Learning Agents and Multi-agent Systems: Adaptation and Multi-agent Learning, pages 60{74. Springer-Verlag, 2008.

Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735{1780, 1997.

Tommi Jaakkola, Michael I Jordan, and Satinder P Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural computation, 6(6):1185{1201, 1994.

Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

Jan Koutn k, Jurgen Schmidhuber, and Faustino Gomez. Evolving deep unsupervised convolutional networks for vision-based reinforcement learning. In Proceedings of the 2014 conference on Genetic and evolutionary computation, pages 541{548. ACM, 2014.

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. arXiv preprint arXiv:1504.00702, 2015.

Yuxi Li and Dale Schuurmans. Mapreduce for parallel reinforcement learning. In Recent Advances in Reinforcement Learning - 9th European Workshop, EWRL 2011, Athens, Greece, September 9-11, 2011, Revised Selected Papers, pages 309{320, 2011.

Satinder Singh, Tommi Jaakkola, Michael L Littman, and Csaba Szepesvari. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38 (3):287{308, 2000.

R. Sutton and A. Barto. Reinforcement Learning: an Introduction. MIT Press, 1998. [OpenAIRE]

28 references, page 1 of 2
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue