S´ebastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. arXiv preprint arXiv:1204.5721, 2012.
 Dan Russo and Benjamin Van Roy. Learning to optimize via information-directed sampling. In Advances in Neural Information Processing Systems, pages 1583-1591, 2014.
 Olivier Chapelle and Lihong Li. An empirical evaluation of thompson sampling. In Advances in neural information processing systems, pages 2249-2257, 2011.
 Lihong Li. Generalized thompson sampling for contextual bandits. arXiv:1310.7163, 2013. [OpenAIRE]