publication . Preprint . 2016

Unifying Count-Based Exploration and Intrinsic Motivation

Bellemare, Marc G.; Srinivasan, Sriram; Ostrovski, Georg; Schaul, Tom; Saxton, David; Munos, Remi;
Open Access English
  • Published: 06 Jun 2016
We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations. Specifically, we focus on the problem of exploration in non-tabular reinforcement learning. Drawing inspiration from the intrinsic motivation literature, we use density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We transform these pseudo-counts into intrinsic rewards and ob...
free text keywords: Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Machine Learning
Download from
52 references, page 1 of 4

Araya-Lo´pez, M., Thomas, V., and Buffet, O. (2012). Near-optimal BRL using optimistic local transitions. In Proceedings of the 29th International Conference on Machine Learning. [OpenAIRE]

Azar, M. G., Munos, R., Gavamzadeh, M., and Kappen, H. J. (2011). Speedy Q-learning. In Advances in Neural Information Processing Systems 24. [OpenAIRE]

Barto, A. G. (2013). Intrinsic motivation and reinforcement learning. In Intrinsically Motivated Learning in Natural and Artificial Systems, pages 17-47. Springer.

Bellemare, M., Veness, J., and Talvitie, E. (2014). Skip context tree switching. In Proceedings of the 31st International Conference on Machine Learning, pages 1458-1466. [OpenAIRE]

Bellemare, M. G. (2015). Count-based frequency estimation using bounded memory. In Proceedings of the 24th International Joint Conference on Artificial Intelligence.

Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. (2013). The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253-279. [OpenAIRE]

Bellemare, M. G., Ostrovski, G., Guez, A., Thomas, P. S., and Munos, R. (2016). Increasing the action gap: New operators for reinforcement learning. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.

Bellemare, M. G., Veness, J., and Bowling, M. (2012). Investigating contingency awareness using Atari 2600 games. In Proceedings of the 26th AAAI Conference on Artificial Intelligence.

Bellman, R. E. (1957). Dynamic programming. Princeton University Press, Princeton, NJ.

Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.

Brafman, R. and Tennenholtz, M. (2002). R-max - a general polynomial time algorithm for near optimal reinforcement learning. Journal of Machine Learning Research, 3:213-231.

Bubeck, S. and Cesa-Bianchi, N. (2012). Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Machine Learning, 5(1):1-122.

Cover, T. M. and Thomas, J. A. (1991). Elements of information theory. John Wiley & Sons.

Dearden, R., Friedman, N., and Russell, S. (1998). Bayesian Q-learning. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pages 761-768.

Diuk, C., Cohen, A., and Littman, M. L. (2008). An object-oriented representation for efficient reinforcement learning. In Proceedings of the 25th International Conference on Machine Learning, pages 240-247. ACM. [OpenAIRE]

52 references, page 1 of 4
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue