Robust Contextual Bandit via the Capped-$\ell_{2}$ norm

Preprint English OPEN
Zhu, Feiyun; Zhu, Xinliang; Wang, Sheng; Yao, Jiawen; Huang, Junzhou;
  • Subject: Statistics - Machine Learning | Computer Science - Learning

This paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decision-making methods in mHealth generally assume that the noise in the dynamic system follows the Gaussian distribution. Those methods use the l... View more
  • References (29)
    29 references, page 1 of 3

    1. G. Cheng, Y. Wang, Y. Gong, F. Zhu, and C. Pan. Urban road extraction via graph cuts based probability propagation. In Image Processing (ICIP), 2014 IEEE International Conference on, pages 5072{5076. IEEE, 2014.

    2. G. Cheng, Y. Wang, F. Zhu, and C. Pan. Road extraction via adaptive graph cuts with multiple features. In Image Processing (ICIP), IEEE International Conference on, pages 3962{3966. IEEE, 2015.

    3. G. Cheng, F. Zhu, S. Xiang, and C. Pan. Road centerline extraction via semisupervised segmentation and multidirection nonmaximum suppression. IEEE Geoscience and Remote Sensing Letters, 13(4):545{549, 2016.

    4. G. Cheng, F. Zhu, S. Xiang, Y. Wang, and C. Pan. Accurate urban road centerline extraction from vhr imagery via multiscale segmentation and tensor voting. Neurocomputing, 205:407{420, 2016.

    5. G. Cheng, F. Zhu, S. Xiang, Y. Wang, and C. Pan. Semisupervised hyperspectral image classi cation via discriminant analysis and robust regression. IEEE J. of Selected Topics in Applied Earth Observations and Remote Sensing, 9(2):595{608, 2016.

    6. M. Dud k, J. Langford, and L. Li. Doubly robust policy evaluation and learning. In ICML, pages 1097{1104, 2011.

    7. H. Gao, F. Nie, T. W. Cai, and H. Huang. Robust capped norm nonnegative matrix factorization: Capped norm nmf. In ACM International Conference on Information and Knowledge (CIKM), pages 871{880, 2015.

    8. I. Grondman, L. Busoniu, G. A. D. Lopes, and R. Babuska. A survey of actorcritic reinforcement learning: Standard and natural policy gradients. IEEE Trans. Systems, Man, and Cybernetics, 42(6):1291{1307, 2012.

    9. H. Lei. An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention. PhD thesis, University of Michigan, 2016.

    10. H. Lei, A. Tewari, and S. Murphy. An actor-critic contextual bandit algorithm for personalized interventions using mobile devices. In NIPS 2014 Workshop: Personalization: Methods and Applications, pages 1 { 9, 2014.

  • Metrics
Share - Bookmark