
Abstract In this paper, we investigate Thompson sampling-based sequential block elimination approaches for dynamic assignment problems in a pure-exploration Multi-Armed Bandit (MAB) setting with limited budget constraints. The problem can be considered as a bandit game-play between the environment and a decision-maker in a metric space. Many instances of problems in fields such as e-commerce, logistics, mobility management, data management and operations research can be framed as dynamic assignment problems with budget constraints. Given an l-dimensional action space representing l variants of an entity and a budget for exploring the action space, the optimal dynamic assignment problem refers to the task of identifying the values to be assigned to different variants of the entity that maximizes the total reward by utilizing at most the given budget of rounds of play. We contribute a class of block elimination-based MAB algorithms specifically designed for the dynamic assignment problem with limited budget. Our algorithms begin by discretizing the continuous action space into a finite set of discrete actions, then proceed with a recursive block elimination procedure to remove sub-optimal actions. The elimination is carried out by calculating confidence bounds over blocks of actions. We explore two different confidence bound estimation techniques. We perform comprehensive experiments on two problem instances from distributed data management and logistics. Our results showcase that our approach yields a lower misidentification probability (i.e., the probability of recommending a non-optimal action) compared to state-of-the-art elimination-based pure-exploration MAB algorithms.
learning with feedback, dynamic assignment, Sampling theory, sample surveys, multi-armed bandits, dynamic pricing, pure-exploration, sequential learning, Thompson sampling
learning with feedback, dynamic assignment, Sampling theory, sample surveys, multi-armed bandits, dynamic pricing, pure-exploration, sequential learning, Thompson sampling
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
