publication . Preprint . 2017

Safe Exploration for Identifying Linear Systems via Robust Optimization

Lu, Tyler; Zinkevich, Martin; Boutilier, Craig; Roy, Binz; Schuurmans, Dale;
Open Access English
  • Published: 29 Nov 2017
Abstract
Safely exploring an unknown dynamical system is critical to the deployment of reinforcement learning (RL) in physical systems where failures may have catastrophic consequences. In scenarios where one knows little about the dynamics, diverse transition data covering relevant regions of state-action space is needed to apply either model-based or model-free RL. Motivated by the cooling of Google's data centers, we study how one can safely identify the parameters of a system model with a desired accuracy and confidence level. In particular, we focus on learning an unknown linear system with Gaussian noise assuming only that, initially, a nominal safe action is known...
Subjects
free text keywords: Computer Science - Learning, Computer Science - Systems and Control
Download from
29 references, page 1 of 2

[1] Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mane. Concrete problems in AI safety. arXiv:1606.06565, 2016. [OpenAIRE]

[2] Karl-Johan Astrom and Torsten Bohlin. Numerical identi cation of linear dynamic systems from normal operating records. In Theory of Self-Adaptive Control Systems, pages 96{111, Teddington, UK, 1967.

[3] Karl-Johan Astrom and Pieter Eykho . System identi cation: A survey. Automatica, 7(2):123{162, 1971.

[4] Felix Berkenkamp and Angela P. Schoellig. Safe and robust learning control with Gaussian processes. In European Control Conference (ECC), pages 2496{2501, 2015. [OpenAIRE]

[5] George E. P. Box, Gwilym Jenkins, Gregory C. Reinsel, and Greta M. Ljung. Time Series Analysis: Forecasting and Control (5th Edition). Wiley, Hoboken, NJ, 2015.

[6] S. Chen, Billings S. A., and P. M. Grant. Non-linear system identi cation using neural networks. International Journal of Control, 51:1191{1214, 1990.

[7] James Demmel. The componentwise distance to the nearest singular matrix. SIAM Journal on Matrix Analysis and Applications, 13:10{19, 1992.

[8] Jim Gao. Machine learning applications for data center optimization. Google white paper, 2014.

[9] Javier Garc a and Fernando Fernandez. Safe exploration of state and action spaces in reinforcement learning. Journal of Arti cial Intelligence Research, 45(1):515{564, 2012.

[10] Javier Garc a and Fernando Fernandez. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437{ 1480, 2015.

[11] Mohammad Ghavamzadeh, Marek Petrik, and Yinlam Chow. Safe policy improvement by minimizing robust baseline regret. In Advances in Neural Information Processing Systems 29 (NIPS-16), pages 2298{2306, Barcelona, Spain, 2016.

[12] William H. Greene, editor. Econometric Analysis (8th Edition). Pearson, New York, NY, 2018.

[13] G. Iyengar. Robust dynamic programming. Mathematics of Operations Research, 30(2):1{21, 2005.

[14] Anayo K. Akametalu, Shahab Kaynama, Jaime Fisac, Melaine N. Zeilinger, Jeremy H. Gillula, and Claire J. Tomlin. Reachability-based safe learning with Gaussian processes. In Proceedings of the IEEE Conference on Decision and Control, pages 1424{1431, Los Angeles, CA, 2014. [OpenAIRE]

[15] Rogier Koppejan and Shimon Whiteson. Neuroevolutionary reinforcement learning for generalized control of simulated helicopters. Evolutionary Intelligence, 4(4):219{241, 2011. [OpenAIRE]

29 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue