
doi: 10.62791/19726
Deep Reinforcement Learning (RL) shows promising results for control problems with continuous action spaces. A drawback to Deep RL is that it can be very computationally intensive; this is particularly concerning when considering fielding Deep RL applications on computational and power-constrained edge computing hardware typically implemented onboard autonomous vehicle platforms. Another drawback to using Deep RL to learn optimal control strategies is that Deep RL agents can learn control strategies that exhibit high frequency and amplitude oscillations, which can negatively affect performance and cause damage to real-world systems. The first part of this thesis focuses on improving the computational efficiency of the Deep Deterministic Policy Gradient (DDPG) algorithm using mixed numerical precision methods. Mixed numerical precision methods are an active research area that is helping to make progress toward improving the computational efficiency of Deep Learning methods. While mixed-precision approaches are well understood for supervised learning tasks, this area is relatively unexplored for Deep RL. We aim to fill this gap in the research by presenting a method to improve the computational efficiency of the DDPG algorithm using mixed numerical precision and loss scaling. Then this thesis presents a numerical study investigating the impact of different neural network architectures on oscillations in the control signals output by DDPG agents when used for a complex continuous control problem. The neural network architectures considered in this study are commonly used in Deep RL literature. This study will first present numerical cases to compare the performance and computational improvements of DDPG agents trained with mixed-precision to those trained with single-precision in the context of continuous control of a complex Autonomous Undersea Vehicle model for various levels of the control system and Deep RL model complexity. Then, a numerical study will be presented to examine the effects of different DDPG actor and critic neural network architectures on action selection to minimize undesirable oscillations in the control signals output by DDPG agents.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
