
arXiv: 1505.00369
handle: 1721.1/98879
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
Published at http://dx.doi.org/10.1214/15-AOS1381 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
multi-phase allocation, 62C20, switching cost, Minimax procedures in statistical decision theory, Mathematics - Statistics Theory, Statistics Theory (math.ST), 510, Applications of statistics to biology and medical sciences; meta analysis, grouped clinical trials, Multi-armed bandit problems, Sequential statistical design, 62L05, batches, sample size determination, multi-armed bandit problems, FOS: Mathematics, regret bounds
multi-phase allocation, 62C20, switching cost, Minimax procedures in statistical decision theory, Mathematics - Statistics Theory, Statistics Theory (math.ST), 510, Applications of statistics to biology and medical sciences; meta analysis, grouped clinical trials, Multi-armed bandit problems, Sequential statistical design, 62L05, batches, sample size determination, multi-armed bandit problems, FOS: Mathematics, regret bounds
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 56 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
