Araya-Lo´pez, M., Thomas, V., and Buffet, O. (2012). Near-optimal BRL using optimistic local transitions. In Proceedings of the 29th International Conference on Machine Learning. [OpenAIRE]
Azar, M. G., Munos, R., Gavamzadeh, M., and Kappen, H. J. (2011). Speedy Q-learning. In Advances in Neural Information Processing Systems 24. [OpenAIRE]
Barto, A. G. (2013). Intrinsic motivation and reinforcement learning. In Intrinsically Motivated Learning in Natural and Artificial Systems, pages 17-47. Springer.
Bellemare, M., Veness, J., and Talvitie, E. (2014). Skip context tree switching. In Proceedings of the 31st International Conference on Machine Learning, pages 1458-1466. [OpenAIRE]
Bellemare, M. G. (2015). Count-based frequency estimation using bounded memory. In Proceedings of the 24th International Joint Conference on Artificial Intelligence.
Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. (2013). The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253-279. [OpenAIRE]
Bellemare, M. G., Ostrovski, G., Guez, A., Thomas, P. S., and Munos, R. (2016). Increasing the action gap: New operators for reinforcement learning. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.
Bellemare, M. G., Veness, J., and Bowling, M. (2012). Investigating contingency awareness using Atari 2600 games. In Proceedings of the 26th AAAI Conference on Artificial Intelligence.
Bellman, R. E. (1957). Dynamic programming. Princeton University Press, Princeton, NJ.
Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.
Brafman, R. and Tennenholtz, M. (2002). R-max - a general polynomial time algorithm for near optimal reinforcement learning. Journal of Machine Learning Research, 3:213-231.
Bubeck, S. and Cesa-Bianchi, N. (2012). Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Machine Learning, 5(1):1-122.
Cover, T. M. and Thomas, J. A. (1991). Elements of information theory. John Wiley & Sons.
Dearden, R., Friedman, N., and Russell, S. (1998). Bayesian Q-learning. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pages 761-768.
Diuk, C., Cohen, A., and Littman, M. L. (2008). An object-oriented representation for efficient reinforcement learning. In Proceedings of the 25th International Conference on Machine Learning, pages 240-247. ACM. [OpenAIRE]