publication . Preprint . 2017

Scalable Generalized Linear Bandits: Online Computation and Hashing

Jun, Kwang-Sung; Bhargava, Aniruddha; Nowak, Robert; Willett, Rebecca;
Open Access English
  • Published: 31 May 2017
Abstract
Generalized Linear Bandits (GLBs), a natural extension of the stochastic linear bandits, has been popular and successful in recent years. However, existing GLBs scale poorly with the number of rounds and the number of arms, limiting their utility in practice. This paper proposes new, scalable solutions to the GLB problem in two respects. First, unlike existing GLBs, whose per-time-step space and time complexity grow at least linearly with time $t$, we propose a new algorithm that performs online computations to enjoy a constant space and time complexity. At its heart is a novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method)...
Subjects
free text keywords: Statistics - Machine Learning, Computer Science - Learning
Download from
42 references, page 1 of 3

[1] Abbasi-Yadkori, Yasin, Pal, David, and Szepesvari, Csaba. Improved Algorithms for Linear Stochastic Bandits. Advances in Neural Information Processing Systems (NIPS), pp. 1{19, 2011.

[2] Abbasi-Yadkori, Yasin, Pal, David, and Szepesvari, Csaba. Online-to-Con dence-Set Conversions and Application to Sparse Stochastic Bandits. In Proceedings of the International Conference on Arti cial Intelligence and Statistics (AISTATS), 2012.

[3] Abeille, Marc and Lazaric, Alessandro. Linear Thompson Sampling Revisited. In Proceedings of the International Conference on Arti cial Intelligence and Statistics (AISTATS), volume 54, pp. 176{184, 2017.

[4] Agrawal, Shipra and Goyal, Navin. Thompson Sampling for Contextual Bandits with Linear Payo s. CoRR, abs/1209.3, 2012.

[5] Agrawal, Shipra and Goyal, Navin. Thompson Sampling for Contextual Bandits with Linear Payo s. In Proceedings of the International Conference on Machine Learning (ICML), pp. 127{135, 2013.

[6] Ahukorala, Kumaripaba, Medlar, Alan, Ilves, Kalle, and Glowacka, Dorota. Balancing Exploration and Exploitation: Empirical Parameterization of Exploratory Search Systems. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp. 1703{1706, 2015.

[7] Auer, Peter and Long, M. Using Con dence Bounds for Exploitation-Exploration Trade-o s. Journal of Machine Learning Research, 3:397{422, 2002.

[8] Chapelle, Olivier and Li, Lihong. An Empirical Evaluation of Thompson Sampling. In Advances in Neural Information Processing Systems (NIPS), pp. 2249{2257, 2011.

[9] Chu, Wei, Li, Lihong, Reyzin, Lev, and Schapire, Robert E. Contextual Bandits with Linear Payo Functions. In Proceedings of the International Conference on Arti cial Intelligence and Statistics (AISTATS), volume 15, pp. 208{214, 2011.

[10] Crammer, Koby and Gentile, Claudio. Multiclass Classi cation with Bandit Feedback Using Adaptive Regularization. Mach. Learn., 90(3):347{383, 2013.

[11] Dani, Varsha, Hayes, Thomas P, and Kakade, Sham M. Stochastic Linear Optimization under Bandit Feedback. In Proceedings of the Conference on Learning Theory (COLT), pp. 355{366, 2008. [OpenAIRE]

[12] Datar, Mayur, Immorlica, Nicole, Indyk, Piotr, and Mirrokni, Vahab S. Locality-sensitive Hashing Scheme Based on P-stable Distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253{262, 2004. [OpenAIRE]

[13] Dekel, Ofer, Gentile, Claudio, and Sridharan, Karthik. Robust selective sampling from single and multiple teachers. In In Proceedings of the Conference on Learning Theory (COLT), 2010.

[14] Dekel, Ofer, Gentile, Claudio, and Sridharan, Karthik. Selective sampling and active learning from single and multiple teachers. Journal of Machine Learning Research, 13:2655{2697, 2012.

[15] Filippi, Sarah, Cappe, Olivier, Garivier, Aurelien, and Szepesvari, Csaba. Parametric Bandits: The Generalized Linear Case. In Advances in Neural Information Processing Systems (NIPS), pp. 586{594. 2010.

42 references, page 1 of 3
Abstract
Generalized Linear Bandits (GLBs), a natural extension of the stochastic linear bandits, has been popular and successful in recent years. However, existing GLBs scale poorly with the number of rounds and the number of arms, limiting their utility in practice. This paper proposes new, scalable solutions to the GLB problem in two respects. First, unlike existing GLBs, whose per-time-step space and time complexity grow at least linearly with time $t$, we propose a new algorithm that performs online computations to enjoy a constant space and time complexity. At its heart is a novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method)...
Subjects
free text keywords: Statistics - Machine Learning, Computer Science - Learning
Download from
42 references, page 1 of 3

[1] Abbasi-Yadkori, Yasin, Pal, David, and Szepesvari, Csaba. Improved Algorithms for Linear Stochastic Bandits. Advances in Neural Information Processing Systems (NIPS), pp. 1{19, 2011.

[2] Abbasi-Yadkori, Yasin, Pal, David, and Szepesvari, Csaba. Online-to-Con dence-Set Conversions and Application to Sparse Stochastic Bandits. In Proceedings of the International Conference on Arti cial Intelligence and Statistics (AISTATS), 2012.

[3] Abeille, Marc and Lazaric, Alessandro. Linear Thompson Sampling Revisited. In Proceedings of the International Conference on Arti cial Intelligence and Statistics (AISTATS), volume 54, pp. 176{184, 2017.

[4] Agrawal, Shipra and Goyal, Navin. Thompson Sampling for Contextual Bandits with Linear Payo s. CoRR, abs/1209.3, 2012.

[5] Agrawal, Shipra and Goyal, Navin. Thompson Sampling for Contextual Bandits with Linear Payo s. In Proceedings of the International Conference on Machine Learning (ICML), pp. 127{135, 2013.

[6] Ahukorala, Kumaripaba, Medlar, Alan, Ilves, Kalle, and Glowacka, Dorota. Balancing Exploration and Exploitation: Empirical Parameterization of Exploratory Search Systems. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp. 1703{1706, 2015.

[7] Auer, Peter and Long, M. Using Con dence Bounds for Exploitation-Exploration Trade-o s. Journal of Machine Learning Research, 3:397{422, 2002.

[8] Chapelle, Olivier and Li, Lihong. An Empirical Evaluation of Thompson Sampling. In Advances in Neural Information Processing Systems (NIPS), pp. 2249{2257, 2011.

[9] Chu, Wei, Li, Lihong, Reyzin, Lev, and Schapire, Robert E. Contextual Bandits with Linear Payo Functions. In Proceedings of the International Conference on Arti cial Intelligence and Statistics (AISTATS), volume 15, pp. 208{214, 2011.

[10] Crammer, Koby and Gentile, Claudio. Multiclass Classi cation with Bandit Feedback Using Adaptive Regularization. Mach. Learn., 90(3):347{383, 2013.

[11] Dani, Varsha, Hayes, Thomas P, and Kakade, Sham M. Stochastic Linear Optimization under Bandit Feedback. In Proceedings of the Conference on Learning Theory (COLT), pp. 355{366, 2008. [OpenAIRE]

[12] Datar, Mayur, Immorlica, Nicole, Indyk, Piotr, and Mirrokni, Vahab S. Locality-sensitive Hashing Scheme Based on P-stable Distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253{262, 2004. [OpenAIRE]

[13] Dekel, Ofer, Gentile, Claudio, and Sridharan, Karthik. Robust selective sampling from single and multiple teachers. In In Proceedings of the Conference on Learning Theory (COLT), 2010.

[14] Dekel, Ofer, Gentile, Claudio, and Sridharan, Karthik. Selective sampling and active learning from single and multiple teachers. Journal of Machine Learning Research, 13:2655{2697, 2012.

[15] Filippi, Sarah, Cappe, Olivier, Garivier, Aurelien, and Szepesvari, Csaba. Parametric Bandits: The Generalized Linear Case. In Advances in Neural Information Processing Systems (NIPS), pp. 586{594. 2010.

42 references, page 1 of 3
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue