In the last chapter, we studied the various aspects of the brain-academy architecture of the ML Agents Toolkit and understood certain scripts that are very important for the agent to make a decision according to a policy. In this chapter, we will be looking into the core concepts of deep reinforcement learning (RL) through Python and its interaction with the C# scripts of the brain-academy architecture. We have had a glimpse of a part of deep RL when we briefly discussed the deep Q-learning algorithm using the OpenAI Gym environment (CartPole) and also when we were discussing the Baselines library of OpenAI. Through the course of training the ML Agents in Tensorflow through external brain, we have also used the proximal policy optimization (PPO) algorithm with the default hyperparameters present in the trainer_config.yaml file. We will be discussing these algorithms in depth along with several other algorithms from the actor critic paradigm. However, to fully understand this chapter, we have to understand how to build deep learning networks using Tensorflow and the Keras module. We also have to understand the basic concepts of deep learning and why it is required in the current context. Through this chapter we will also create neural network models for computer vision methods, which will be extremely important when we will be studying the GridWorld environment. Since we primarily have ray and camera sensors that provide the observation space to the agent, in most of the models, we will have two variants of policies: multi-layered perceptron (MLP-based networks) and convolution neural networks (CNN-2D-based networks). We will also be looking into other simulations and games that are created using the ML Agents Toolkit and will also try to train our models based on the Baseline implementations by OpenAI. However, let us first understand the fundamentals of generic neural network models in deep learning.

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now