publication . Preprint . 2017

Robust Contextual Bandit via the Capped-$\ell_{2}$ norm

Zhu, Feiyun; Zhu, Xinliang; Wang, Sheng; Yao, Jiawen; Huang, Junzhou;
Open Access English
  • Published: 17 Aug 2017
This paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decision-making methods in mHealth generally assume that the noise in the dynamic system follows the Gaussian distribution. Those methods use the least-square-based algorithm to estimate the expected reward, which is prone to the existence of outliers. To deal with the issue of outliers, we propose a novel robust actor-critic contextual bandit method for the mHealth intervention. In the critic updating, the capped-$\ell_{2}$ norm is used to measure the approximation error, which prevents outliers from dominating our objective. A set of weigh...
free text keywords: Computer Science - Learning, Statistics - Machine Learning
Download from
29 references, page 1 of 2

1. G. Cheng, Y. Wang, Y. Gong, F. Zhu, and C. Pan. Urban road extraction via graph cuts based probability propagation. In Image Processing (ICIP), 2014 IEEE International Conference on, pages 5072{5076. IEEE, 2014.

2. G. Cheng, Y. Wang, F. Zhu, and C. Pan. Road extraction via adaptive graph cuts with multiple features. In Image Processing (ICIP), IEEE International Conference on, pages 3962{3966. IEEE, 2015.

3. G. Cheng, F. Zhu, S. Xiang, and C. Pan. Road centerline extraction via semisupervised segmentation and multidirection nonmaximum suppression. IEEE Geoscience and Remote Sensing Letters, 13(4):545{549, 2016.

4. G. Cheng, F. Zhu, S. Xiang, Y. Wang, and C. Pan. Accurate urban road centerline extraction from vhr imagery via multiscale segmentation and tensor voting. Neurocomputing, 205:407{420, 2016.

5. G. Cheng, F. Zhu, S. Xiang, Y. Wang, and C. Pan. Semisupervised hyperspectral image classi cation via discriminant analysis and robust regression. IEEE J. of Selected Topics in Applied Earth Observations and Remote Sensing, 9(2):595{608, 2016.

6. M. Dud k, J. Langford, and L. Li. Doubly robust policy evaluation and learning. In ICML, pages 1097{1104, 2011.

7. H. Gao, F. Nie, T. W. Cai, and H. Huang. Robust capped norm nonnegative matrix factorization: Capped norm nmf. In ACM International Conference on Information and Knowledge (CIKM), pages 871{880, 2015.

8. I. Grondman, L. Busoniu, G. A. D. Lopes, and R. Babuska. A survey of actorcritic reinforcement learning: Standard and natural policy gradients. IEEE Trans. Systems, Man, and Cybernetics, 42(6):1291{1307, 2012.

9. H. Lei. An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention. PhD thesis, University of Michigan, 2016.

10. H. Lei, A. Tewari, and S. Murphy. An actor-critic contextual bandit algorithm for personalized interventions using mobile devices. In NIPS 2014 Workshop: Personalization: Methods and Applications, pages 1 { 9, 2014.

11. H. Li, Y. Wang, S. Xiang, J. Duan, F. Zhu, and C. Pan. A label propagation method using spatial-spectral consistency for hyperspectral image classi cation. International Journal of Remote Sensing, 37(1):191{211, 2016.

12. L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In International Conference on World Wide Web (WWW), pages 661{670, 2010.

13. P. Liao, A. Tewari, and S. Murphy. Constructing just-in-time adaptive interventions. Phd Section Proposal, pages 1{49, 2015.

14. S. A. Murphy, Y. Deng, E. B. Laber, H. R. Maei, R. S. Sutton, and K. Witkiewitz. A batch, o -policy, actor-critic algorithm for optimizing the average reward. CoRR, abs/1607.05047, 2016. [OpenAIRE]

15. F. Nie, H. Huang, X. Cai, and C. H. Ding. E cient and robust feature selection via joint `2;1-norms minimization. In Advances in Neural Information Processing Systems (NIPS), pages 1813{1821. Curran Associates, Inc., 2010.

29 references, page 1 of 2
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue