On-Policy Robot Imitation Learning from a Converging Supervisor

Preprint English OPEN
Balakrishna, Ashwin; Thananjeyan, Brijen; Lee, Jonathan; Li, Felix; Zahed, Arsh; Gonzalez, Joseph E.; Goldberg, Ken;
(2019)
  • Subject: Computer Science - Machine Learning | Computer Science - Artificial Intelligence | Computer Science - Robotics

Existing on-policy imitation learning algorithms, such as DAgger, assume access to a fixed supervisor. However, there are many settings where the supervisor may evolve during policy learning, such as a human performing a novel task or an improving algorithmic controller... View more
  • References (36)
    36 references, page 1 of 4

    [1] C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine. One-shot visual imitation learning via meta-learning. In S. Levine, V. Vanhoucke, and K. Goldberg, editors, Proceedings of the 1st Annual Conference on Robot Learning, volume 78 of Proceedings of Machine Learning Research, pages 357-368. PMLR, 13-15 Nov 2017. URL http://proceedings.mlr.press/v78/finn17a.html.

    [2] Y. Liu, A. Gupta, P. Abbeel, and S. Levine. Imitation from observation: Learning to imitate behaviors from raw video via context translation. In ICRA, pages 1118-1125. IEEE, 2018.

    [3] T. Yu, C. Finn, A. Xie, S. Dasari, T. Zhang, P. Abbeel, and S. Levine. One-shot imitation from observing humans via domain-adaptive meta-learning. In ICLR (Workshop). OpenReview.net, 2018.

    [4] T. Zhang, Z. McCarthy, O. Jow, D. Lee, K. Y. Goldberg, and P. Abbeel. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1-8, 2018.

    [5] Y. Gao, S. S. Vedula, C. E. Reiley, N. Ahmidi, B. Varadarajan, H. C. Lin, L. Tao, L. Zappella, B. Be´jar, D. D. Yuh, et al. Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion modeling. In MICCAI Workshop: M2CAI, volume 3, page 3, 2014.

    [6] G. Kahn, T. Zhang, S. Levine, and P. Abbeel. Plato: Policy learning using adaptive trajectory optimization. 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 3342-3349, 2017.

    [7] Y. Pan, C.-A. Cheng, K. Saigol, K. Lee, X. Yan, E. Theodorou, and B. Boots. Agile autonomous driving via end-to-end deep imitation learning. In Proceedings of Robotics: Science and Systems (RSS), 2018.

    [8] K. Chua, R. Calandra, R. McAllister, and S. Levine. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. NeurIPS, abs/1805.12114, 2018. URL http://arxiv.org/abs/ 1805.12114.

    [9] A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. ICRA, 2018.

    [10] B. Thananjeyan, A. Balakrishna, U. Rosolia, F. Li, R. McAllister, J. E. Gonzalez, S. Levine, F. Borrelli, and K. Goldberg. Extending deep model predictive control with safety augmented value estimation from demonstrations. arXiv preprint arXiv:1905.13402, 2019.

  • Related Research Results (2)
  • Metrics
Share - Bookmark