publication . Preprint . 2016

Model-Free Episodic Control

Blundell, Charles; Uria, Benigno; Pritzel, Alexander; Li, Yazhe; Ruderman, Avraham; Leibo, Joel Z; Rae, Jack; Wierstra, Daan; Hassabis, Demis;
Open Access English
  • Published: 14 Jun 2016
Abstract
State of the art deep reinforcement learning algorithms take many millions of interactions to attain human-level performance. Humans, on the other hand, can very quickly exploit highly rewarding nuances of an environment upon first discovery. In the brain, such rapid learning is thought to depend on the hippocampus and its capacity for episodic memory. Here we investigate whether a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks. We demonstrate that it not only attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher...
Subjects
free text keywords: Statistics - Machine Learning, Computer Science - Learning, Quantitative Biology - Neurons and Cognition
Download from
40 references, page 1 of 3

[1] Per Andersen, Richard Morris, David Amaral, Tim Bliss, and John OKeefe. The hippocampus book. Oxford University Press, 2006.

[2] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253-279, 06 2013. [OpenAIRE]

[3] Malcolm W Brown and John P Aggleton. Recognition memory: what are the roles of the perirhinal cortex and hippocampus? Nature Reviews Neuroscience, 2(1):51-61, 2001.

[4] Nicola S Clayton and Anthony Dickinson. Episodic-like memory during cache recovery by scrub jays. Nature, 395(6699):272-274, 1998.

[5] Nathaniel D Daw, Yael Niv, and Peter Dayan. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience, 8(12):1704-1711, 2005.

[6] Alexey Dosovitskiy, Jost Tobias Springenberg, and Thomas Brox. Learning to generate chairs with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1538-1546, 2015. [OpenAIRE]

[7] David J Foster and Matthew A Wilson. Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature, 440(7084):680-683, 2006.

[8] Oliver Hardt, Karim Nader, and Lynn Nadel. Decay happens: the role of active forgetting in memory. Trends in cognitive sciences, 17(3):111-120, 2013.

[9] John J Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8):2554-2558, 1982.

[10] William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics, 26(189-206):1, 1984.

[11] Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. Semisupervised learning with deep generative models. In Advances in Neural Information Processing Systems, pages 3581-3589, 2014.

[12] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

[13] Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. Building machines that learn and think like people. arXiv preprint arXiv:1604.00289, 2016. [OpenAIRE]

[14] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.

[15] Joel Z. Leibo, Julien Cornebise, Sergio Gomez, and Demis Hassabis. Approximate hubel-wiesel modules and the data structures of neural computation. arxiv:1512.08457 [cs.NE], 2015. [OpenAIRE]

40 references, page 1 of 3
Abstract
State of the art deep reinforcement learning algorithms take many millions of interactions to attain human-level performance. Humans, on the other hand, can very quickly exploit highly rewarding nuances of an environment upon first discovery. In the brain, such rapid learning is thought to depend on the hippocampus and its capacity for episodic memory. Here we investigate whether a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks. We demonstrate that it not only attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher...
Subjects
free text keywords: Statistics - Machine Learning, Computer Science - Learning, Quantitative Biology - Neurons and Cognition
Download from
40 references, page 1 of 3

[1] Per Andersen, Richard Morris, David Amaral, Tim Bliss, and John OKeefe. The hippocampus book. Oxford University Press, 2006.

[2] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253-279, 06 2013. [OpenAIRE]

[3] Malcolm W Brown and John P Aggleton. Recognition memory: what are the roles of the perirhinal cortex and hippocampus? Nature Reviews Neuroscience, 2(1):51-61, 2001.

[4] Nicola S Clayton and Anthony Dickinson. Episodic-like memory during cache recovery by scrub jays. Nature, 395(6699):272-274, 1998.

[5] Nathaniel D Daw, Yael Niv, and Peter Dayan. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience, 8(12):1704-1711, 2005.

[6] Alexey Dosovitskiy, Jost Tobias Springenberg, and Thomas Brox. Learning to generate chairs with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1538-1546, 2015. [OpenAIRE]

[7] David J Foster and Matthew A Wilson. Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature, 440(7084):680-683, 2006.

[8] Oliver Hardt, Karim Nader, and Lynn Nadel. Decay happens: the role of active forgetting in memory. Trends in cognitive sciences, 17(3):111-120, 2013.

[9] John J Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8):2554-2558, 1982.

[10] William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics, 26(189-206):1, 1984.

[11] Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. Semisupervised learning with deep generative models. In Advances in Neural Information Processing Systems, pages 3581-3589, 2014.

[12] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

[13] Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. Building machines that learn and think like people. arXiv preprint arXiv:1604.00289, 2016. [OpenAIRE]

[14] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.

[15] Joel Z. Leibo, Julien Cornebise, Sergio Gomez, and Demis Hassabis. Approximate hubel-wiesel modules and the data structures of neural computation. arxiv:1512.08457 [cs.NE], 2015. [OpenAIRE]

40 references, page 1 of 3
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue