Thompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-exploration

descriptionPublicationkeyboard_double_arrow_right Article 18 Dec 2024 English Publisher:Springer Science and Business Media LLCJournal:Data Mining and Knowledge Discovery, volume 39 (issn: 1384-5810, eissn: 1573-756X,

Copyright policy )Funded by:EC | TRACE

Authors: Shameem Puthiya Parambath; Christos Anagnostopoulos 0001; Saleh Abdullah M. Alfahad;

doi: 10.1007/s10618-024-01083-2

Thompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-exploration

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Abstract In this paper, we investigate Thompson sampling-based sequential block elimination approaches for dynamic assignment problems in a pure-exploration Multi-Armed Bandit (MAB) setting with limited budget constraints. The problem can be considered as a bandit game-play between the environment and a decision-maker in a metric space. Many instances of problems in fields such as e-commerce, logistics, mobility management, data management and operations research can be framed as dynamic assignment problems with budget constraints. Given an l-dimensional action space representing l variants of an entity and a budget for exploring the action space, the optimal dynamic assignment problem refers to the task of identifying the values to be assigned to different variants of the entity that maximizes the total reward by utilizing at most the given budget of rounds of play. We contribute a class of block elimination-based MAB algorithms specifically designed for the dynamic assignment problem with limited budget. Our algorithms begin by discretizing the continuous action space into a finite set of discrete actions, then proceed with a recursive block elimination procedure to remove sub-optimal actions. The elimination is carried out by calculating confidence bounds over blocks of actions. We explore two different confidence bound estimation techniques. We perform comprehensive experiments on two problem instances from distributed data management and logistics. Our results showcase that our approach yields a lower misidentification probability (i.e., the probability of recommending a non-optimal action) compared to state-of-the-art elimination-based pure-exploration MAB algorithms.

Related Organizations

University of Glasgow
United Kingdom

Keywords

learning with feedback, dynamic assignment, Sampling theory, sample surveys, multi-armed bandits, dynamic pricing, pure-exploration, sequential learning, Thompson sampling

1 Research products, page 1 of 1

tsrbe software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average