Azar, Mohammad Gheshlaghi, Munos, Re´mi, and Kappen, Hilbert. On the sample complexity of reinforcement learning with a generative model. In Proceedings of the International Conference on Machine Learning, 2012. [OpenAIRE]
Bellemare, Marc G, Naddaf, Yavar, Veness, Joel, and Bowling, Michael. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253-279, 2013. [OpenAIRE]
Bellemare, Marc G., Danihelka, Ivo, Dabney, Will, Mohamed, Shakir, Lakshminarayanan, Balaji, Hoyer, Stephan, and Munos, Re´mi. The cramer distance as a solution to biased wasserstein gradients. arXiv, 2017. [OpenAIRE]
Bellman, Richard E. Dynamic programming. Princeton University Press, Princeton, NJ, 1957.
Bertsekas, Dimitri P. and Tsitsiklis, John N. Neuro-Dynamic Programming. Athena Scientific, 1996.
Bickel, Peter J. and Freedman, David A. Some asymptotic theory for the bootstrap. The Annals of Statistics, pp. 1196-1217, 1981. [OpenAIRE]
Billingsley, Patrick. Probability and measure. John Wiley & Sons, 1995.
Caruana, Rich. Multitask learning. Machine Learning, 28(1): 41-75, 1997.
Chung, Kun-Jen and Sobel, Matthew J. Discounted mdps: Distribution functions and exponential utility maximization. SIAM Journal on Control and Optimization, 25(1):49-62, 1987. [OpenAIRE]
Dearden, Richard, Friedman, Nir, and Russell, Stuart. Bayesian Q-learning. In Proceedings of the National Conference on Artificial Intelligence, 1998.
Engel, Yaakov, Mannor, Shie, and Meir, Ron. Reinforcement learning with gaussian processes. In Proceedings of the International Conference on Machine Learning, 2005.
Geist, Matthieu and Pietquin, Olivier. Kalman temporal differences. Journal of Artificial Intelligence Research, 39:483-532, 2010. [OpenAIRE]
Gordon, Geoffrey. Stable function approximation in dynamic programming. In Proceedings of the Twelfth International Conference on Machine Learning, 1995.
Harutyunyan, Anna, Bellemare, Marc G., Stepleton, Tom, and Munos, Re´mi. Q( ) with off-policy corrections. In Proceedings of the Conference on Algorithmic Learning Theory, 2016. [OpenAIRE]
Hoffman, Matthew D., de Freitas, Nando, Doucet, Arnaud, and Peters, Jan. An expectation maximization algorithm for continuous markov decision processes with arbitrary reward. In Proceedings of the International Conference on Artificial Intelligence and Statistics, 2009.