We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning ... View more
M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. J. of Artificial Intelligence Research, 2013.
Y. Bengio, A. Courville, and P. Vincent. Representation Learning: A Review and New Perspectives. PAMI, 2013.
D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
C. Blundell, B. Uria, A. Pritzel, Y. Li, A. Ruderman, J. Z. Leibo, J. Rae, D. Wierstra, and D. Hassabis. ModelFree Episodic Control. In http://arxiv.org/pdf/1606.04460v1.pdf, 2016.
G. E. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 2012.
L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: A survey. JMLR, 1996.
A. Krizhevsky, I. Sutskever, , and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Proc. NIPS, 2012.
S. Lange and M. Riedmiller. Deep auto-encoder neural networks in reinforcement learning. In Proc. Int. Jt. Conf. Neural. Netw., 2010.
Y. LeCun, Y. Bengio, and G. E. Hinton. Deep learning. Nature, 2015.
L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 1992.