Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao https://doi.org/10.1...arrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
https://doi.org/10.1007/978-1-...
Part of book or chapter of book . 2020 . Peer-reviewed
License: Springer TDM
Data sources: Crossref
versions View all 1 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Deep Reinforcement Learning

Authors: Abhilash Majumder;

Deep Reinforcement Learning

Abstract

In the last chapter, we studied the various aspects of the brain-academy architecture of the ML Agents Toolkit and understood certain scripts that are very important for the agent to make a decision according to a policy. In this chapter, we will be looking into the core concepts of deep reinforcement learning (RL) through Python and its interaction with the C# scripts of the brain-academy architecture. We have had a glimpse of a part of deep RL when we briefly discussed the deep Q-learning algorithm using the OpenAI Gym environment (CartPole) and also when we were discussing the Baselines library of OpenAI. Through the course of training the ML Agents in Tensorflow through external brain, we have also used the proximal policy optimization (PPO) algorithm with the default hyperparameters present in the trainer_config.yaml file. We will be discussing these algorithms in depth along with several other algorithms from the actor critic paradigm. However, to fully understand this chapter, we have to understand how to build deep learning networks using Tensorflow and the Keras module. We also have to understand the basic concepts of deep learning and why it is required in the current context. Through this chapter we will also create neural network models for computer vision methods, which will be extremely important when we will be studying the GridWorld environment. Since we primarily have ray and camera sensors that provide the observation space to the agent, in most of the models, we will have two variants of policies: multi-layered perceptron (MLP-based networks) and convolution neural networks (CNN-2D-based networks). We will also be looking into other simulations and games that are created using the ML Agents Toolkit and will also try to train our models based on the Baseline implementations by OpenAI. However, let us first understand the fundamentals of generic neural network models in deep learning.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
1
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!