On a restless multi-armed bandit problem with non-identical arms

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Sep 2011Publisher:IEEEJournal:2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

Authors: Naumaan Nayyar; Yi Gai; Bhaskar Krishnamachari;

doi: 10.1109/allerton.2011.6120191

On a restless multi-armed bandit problem with non-identical arms

- Summary
- Metrics

Abstract

We consider the following learning problem motivated by opportunistic spectrum access in cognitive radio networks. There are N independent Gilbert-Elliott channels with possibly non-identical transition matrices. It is desired to have an online policy to maximize the long-term expected discounted reward from accessing one channel at each time dynamically. While there is a stream of recent results on this problem when the channels are identical, much less is known for the harder case of non-identical channels. We provide the first characterization of the structure of the optimal policy for this problem when the channels can be non-identical, in the Bayesian case (when the transition matrices are known). We also provide the first provably efficient learning algorithm for a non-Bayesian version of this problem (when the transition matrices are unknown). Specifically, for the special case of two positively correlated channels, we use the structure we identify to develop a novel mapping to a different multi-armed bandit with countably-infinite arms, in which each arm corresponds to a threshold-based policy. Using this mapping, we propose a policy that achieves near-logarithmic regret for this problem with respect to an ∊-optimal solution.

Related Organizations

University of Southern California
United States
University of California System
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	5
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average