publication . Part of book or chapter of book . Preprint . 2014

Concurrent Bandits and Cognitive Radio Networks

Avner, Orly; Mannor, Shie;
Open Access
  • Published: 22 Apr 2014
  • Publisher: Springer Berlin Heidelberg
Abstract
We consider the problem of multiple users targeting the arms of a single multi-armed stochastic bandit. The motivation for this problem comes from cognitive radio networks, where selfish users need to coexist without any side communication between them, implicit cooperation or common control. Even the number of users may be unknown and can vary as users join or leave the network. We propose an algorithm that combines an $\epsilon$-greedy learning rule with a collision avoidance mechanism. We analyze its regret with respect to the system-wide optimum and show that sub-linear regret can be obtained in this setting. Experiments show dramatic improvement compared to...
Subjects
free text keywords: Computer Science - Learning, Computer Science - Multiagent Systems
Download fromView all 2 versions
http://arxiv.org/pdf/1404.5421...
Part of book or chapter of book
Provider: UnpayWall
http://link.springer.com/conte...
Part of book or chapter of book . 2014
Provider: Crossref
22 references, page 1 of 2

[1] A. Anandkumar, N. Michael, A.K. Tang, and A. Swami. Distributed algorithms for learning and cognitive medium access with logarithmic regret. Selected Areas in Communications, IEEE Journal on, 29(4):731{745, 2011. [OpenAIRE]

[2] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2):235{256, 2002.

[3] P. Auer, N. Cesa-Bianchi, Y. Freund, and R.E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48{77, 2002.

[4] P. Auer and R. Ortner. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica, 61(1):55{65, 2010.

[5] O. Avner and S. Mannor. Stochastic bandits with pathwise constraints. In 50th IEEE Conference on Decision and Control, December 2011.

[6] O. Avner, S. Mannor, and O. Shamir. Decoupling exploration and exploitation in multi-armed bandits. In 29th International Conference on Machine Learning, December 2012. [OpenAIRE]

[7] D.A. Berry and B. Fristedt. Bandit problems: sequential allocation of experiments. Chapman and Hall London, 1985. [OpenAIRE]

[8] S. Choe. Performance analysis of slotted aloha based multi-channel cognitive packet radio network. In Proceedings of the 6th IEEE Conference on Consumer Communications and Networking Conference, CCNC'09, pages 672{676, 2009.

[9] T.M. Cover and J.A. Thomas. Elements of information theory. John Wiley & Sons, 1991.

[10] E. Even-Dar, S. Mannor, and Y. Mansour. PAC bounds for multi-armed bandit and markov decision processes. In Computational Learning Theory, pages 193{209. Springer, 2002.

[11] A. Garivier and O. Capp. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Conference On Learning Theory, pages 359{376, Jul. 2011. [OpenAIRE]

[12] G.H. Hardy, J.E. Littlewood, and G. Polya. Inequalities. Cambridge University Press, 1988.

[13] W. Jouini, D. Ernst, C. Moy, and J. Palicot. Multi-armed bandit based policies for cognitive radio's decision making issues. In Signals, Circuits and Systems (SCS), 2009 3rd International Conference on, pages 1{6. IEEE, 2010.

[14] D. Kalathil, N. Nayyar, and R. Jain. Decentralized learning for multi-player multi-armed bandits. In 51st IEEE Conference on Decision and Control, pages 3960{3965, 2012. [OpenAIRE]

[15] D.J. Leith, P. Cli ord, V. Badarla, and D. Malone. WLAN channel selection without communication. Computer Networks, 2012. [OpenAIRE]

22 references, page 1 of 2
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue