Mean-Variance Optimization in Markov Decision Processes

Article, Preprint English OPEN
Mannor, Shie; Tsitsiklis, John N.;
(2011)
  • Publisher: International Machine Learning Society
  • Subject: Computer Science - Artificial Intelligence | Computer Science - Learning

We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of comput... View more
  • References (20)
    20 references, page 1 of 2

    Altman, E. (1999). Constrained Markov decision processes. Chapman and Hall.

    Artzner, P., Delbaen, F., Eber, J., & Heath, D. (1999). Coherent measures of risk. Mathematical Finance, 9(3), 203-228.

    Bertsekas, D. (1995). Dynamic programming and optimal control. Athena Scientific.

    Chung, K., & Sobel, M. (1987). Discounted MDP's: distribution functions and exponential utility maximization. SIAM Journal on Control and Optimization, 25(1), 49 - 62.

    Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: a guide to the theory of npcompleteness. New York: W.H. Freeman.

    Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological), 41(2), 148-177.

    Iyengar, G. (2005). Robust dynamic programming. Mathematics of Operations Research, 30, 257-280.

    Le Tallec, Y. (2007). Robust, risk-sensitive, and data-driven control of Markov decision processes. Unpublished doctoral dissertation, Operations Research Center, MIT, Cambridge, MA.

    Liu, Y., & Koenig, S. (2005). Risk-sensitive planning with one-switch utility functions: Value iteration. In Proceedings of the twentieth AAAI conference on artificial intelligence (p. 993-999).

    Luenberger, D. (1997). Investment science. Oxford University Press.

  • Related Organizations (6)
  • Metrics
Share - Bookmark