publication . Doctoral thesis . 2013

Parameter-explorierende Policy Gradients und ihre Implikationen

Sehnke, Frank;
Open Access English
  • Published: 11 Apr 2013
  • Publisher: Technical University of Munich
  • Country: Germany
Reinforcement Learning is the most commonly used class of learning algorithms which lets robots or other systems autonomously learn their behaviour. Learning is enabled solely through interaction with the environment. Today’s learning systems are often confronted with high dimensional and continuous problems. To solve those, so-called Policy Gradient methods are used more and more often. The PGPE algorithm developed in this thesis, a new type of Policy Gradient algorithm, allows model-free learning in complex, continuous, partially observable and high dimensional environments. We show that tasks like grasping of glasses and plates with an human-like arm can be l...
free text keywords: Reinforcement Learning, Policy Gradients, Parameter Exploration, Robotics, Reinforcement Learning, Policy Gradients, Parameter Exploration, Robotik, Informatik, Wissen, Systeme, ddc:000
Related Organizations
Download from
Doctoral thesis . 2013
Provider: MediaTUM
43 references, page 1 of 3

1 introduction 3 1.1 Motivation 3 1.1.1 Reinforcement Learning for Robotics 1.1.2 Policy Gradients 5 1.1.3 Exploration in Parameter Space 5 1.1.4 Our Approach 7 1.2 Thesis Contribution 7 1.3 Notation 8

2 problem definition 9 2.1 Markov Decision Processes 9 2.2 Partially Observable Markov Decision Processes 2.3 Long Term Reward and Episodic Tasks 10

3 state of the art 13 3.1 Reinforcement Learning 13 3.1.1 Classical 13 3.1.2 Evolution 14 3.1.3 Policy Gradients 17 3.2 Exploration 22 3.2.1 Exploration in Reinforcement Learning 3.2.2 Exploration in Policy Gradients 22 3.2.3 Exploring in Evolution 24 3.2.4 Exploring in Parameter Space 24

4 part summary and conclusion 27

B. Gassend, D. Lim, D. Clarke, M. Van Dijk, and S. Devadas. Identification and authentication of integrated circuits. Concurrency and Computation: Practice & Experience, 16(11):1077-1098, 2004. (Cited on pages 82 and 83.)

S. Gelly and D. Silver. Combining online and offline knowledge in UCT. In ICML; Vol. 227, 2007. URL cfm?id=1273496.1273531. (Cited on page 102.) [OpenAIRE]

T. Glasmachers, T. Schaul, S. Yi, D. Wierstra, and J. Schmidhuber. Exponential natural evolution strategies. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 393-400. ACM, 2010. (Cited on page 119.)

F. Gomez, J. Schmidhuber, and R. Miikkulainen. Efficient non-linear control through neuroevolution. 2006. (Cited on page 16.) [OpenAIRE]

F. Gomez, J. Schmidhuber, and R. Miikkulainen. Accelerated neural evolution through cooperatively coevolved synapses. The Journal of Machine Learning Research, 9:937-965, 2008. ISSN 1532-4435. (Cited on page 16.)

F. J. Gomez and J. Schmidhuber. Co-evolving recurrent neurons learn deep memory POMDPs. In Proc. of the 2005 conference on genetic and evolutionary computation (GECCO), Washington, D. C. ACM Press, New York, NY, USA, 2005. (Cited on page 16.)

F.J. Gomez and R. Miikkulainen. Solving non-Markovian control tasks with neuroevolution. In International Joint Conference on Artificial Intelligence, volume 16, pages 1356-1361. Citeseer, 1999. (Cited on page 16.)

A. Graves. Supervised Sequence Labelling with Recurrent Neural Networks. PhD thesis, Technische Universität München, 2007. (Cited on pages 102, 103, 104, and 107.)

A. Graves, S. Fernández, and J. Schmidhuber. Multi-Dimensional Recurrent Neural Networks, 2007. (Cited on pages 103 and 104.)

M. Grüttner. Evolving Multidimensional Recurrent Neural Networks for the Capture Game in Go, 2008. (Cited on pages 101, 102, 103, 104, 107, and 120.)

M. Grüttner, F. Sehnke, T. Schaul, and J. Schmidhuber. Multidimensional deep memory go-player for parameter exploring policy gradients. In W. Duch K. Diamantaras and L. Iliadis, editors, Proceedings of the International Conference on Artificial Neural Networks. ICANN 2010, Springer-Verlag Berlin Heidelberg, 2010. (Cited on pages 102, 103, 104, 105, 106, and 107.) [OpenAIRE]

43 references, page 1 of 3
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue