publication . Preprint . 2015

A Note on Information-Directed Sampling and Thompson Sampling

Zhou, Li;
Open Access English
  • Published: 23 Mar 2015
This note introduce three Bayesian style Multi-armed bandit algorithms: Information-directed sampling, Thompson Sampling and Generalized Thompson Sampling. The goal is to give an intuitive explanation for these three algorithms and their regret bounds, and provide some derivations that are omitted in the original papers.
arXiv: Mathematics::Group TheoryComputer Science::Machine Learning
ACM Computing Classification System: TheoryofComputation_COMPUTATIONBYABSTRACTDEVICES
free text keywords: Computer Science - Learning, Computer Science - Artificial Intelligence
Download from

[1] SĀ“ebastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. arXiv preprint arXiv:1204.5721, 2012.

[2] Dan Russo and Benjamin Van Roy. Learning to optimize via information-directed sampling. In Advances in Neural Information Processing Systems, pages 1583-1591, 2014.

[3] Olivier Chapelle and Lihong Li. An empirical evaluation of thompson sampling. In Advances in neural information processing systems, pages 2249-2257, 2011.

[4] Lihong Li. Generalized thompson sampling for contextual bandits. arXiv:1310.7163, 2013. [OpenAIRE]

Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue