Unicorn: Continual Learning with a Universal, Off-policy Agent

Preprint English OPEN
Mankowitz, Daniel J.; Žídek, Augustin; Barreto, André; Horgan, Dan; Hessel, Matteo; Quan, John; Oh, Junhyuk; van Hasselt, Hado; Silver, David; Schaul, Tom; (2018)
  • Subject: Computer Science - Machine Learning

Some real-world domains are best characterized as a single task, but for others this perspective is limiting. Instead, some tasks continually grow in complexity, in tandem with the agent's competence. In continual learning, also referred to as lifelong learning, there a... View more
  • References (37)
    37 references, page 1 of 4

    Ammar, Haitham Bou, Tutunov, Rasul, and Eaton, Eric. Safe policy search for lifelong reinforcement learning with sublinear regret. In International Conference on Machine Learning, pp. 2361-2369, 2015.

    Andrychowicz, Marcin, Crow, Dwight, Ray, Alex, Schneider, Jonas, Fong, Rachel, Welinder, Peter, McGrew, Bob, Tobin, Josh, Abbeel, OpenAI Pieter, and Zaremba, Wojciech. Hindsight experience replay. In Advances in Neural Information Processing Systems, pp. 5055-5065, 2017.

    Bacon, Pierre-Luc and Precup, Doina. The option-critic architecture. In AAAI, 2017.

    Barreto, André, Dabney, Will, Munos, Rémi, Hunt, Jonathan J, Schaul, Tom, Silver, David, and van Hasselt, Hado P. Successor features for transfer in reinforcement learning. In Advances in Neural Information Processing Systems, pp. 4056-4066, 2017.

    Beattie, Charles, Leibo, Joel Z, Teplyashin, Denis, Ward, Tom, Wainwright, Marcus, Küttler, Heinrich, Lefrancq, Andrew, Green, Simon, Valdés, Víctor, Sadik, Amir, et al. Deepmind lab. arXiv preprint arXiv:1612.03801, 2016.

    Bellemare, Marc G, Dabney, Will, and Munos, Rémi. A distributional perspective on reinforcement learning. arXiv preprint arXiv:1707.06887, 2017.

    Bengio, Yoshua, Louradour, Jérôme, Collobert, Ronan, and Weston, Jason. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pp. 41-48. ACM, 2009.

    De Asis, Kristopher, Hernandez-Garcia, J Fernando, Holland, G Zacharias, and Sutton, Richard S. Multi-step reinforcement learning: A unifying algorithm. arXiv preprint arXiv:1703.01327, 2017.

    Espeholt, Lasse, Soyer, Hubert, Munos, Remi, Simonyan, Karen, Mnih, Volodymir, Ward, Tom, Doron, Yotam, Firoiu, Vlad, Harley, Tim, Dunning, Iain, Legg, Shane, and Kavukcuoglu, Koray. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561, 2018.

    Finn, Chelsea, Abbeel, Pieter, and Levine, Sergey. Modelagnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400, 2017.

  • Metrics
    No metrics available
Share - Bookmark