
doi: 10.2139/ssrn.3724377
Exponential bandits are widely adopted in economics and marketing due to their tractability. This paper analyzes the one-agent multi-armed account of exponential bandits, where the agent dynamically selects arms to maximize total payoff. We motivate our base model by examples with arms being of the same type, while the results are generalized to cases where arms are either independent or dependent. The contribution is fourfold. First, we characterize the optimal policy for the agent to choose arms. Under the optimal policy, the agent selects one arm each time, and an arm is used at most once. Second, we show that the agent may not regard information acquisition as a last-ditch effort before quitting, which contradicts the existing literature. Third, with a discount factor, an arm may be used more than once. Fourth, for the case of negatively correlated bandits, the agent may use more than one arms simultaneously. The paper is of both theoretical and practical significance since the model fits well with various situations, including project selection, product promotion, and drug development. Implications for these applications are discussed.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
