
pmid: 37676803
This article concentrates on proposing a scalable deep reinforcement learning (DRL) method for a multiple unmanned surface vehicle (multi-USV) system to operate cooperative target invasion. The multi-USV system, which is made up of multiple invaders, needs to invade target areas in a specified time. A novel scalable reinforcement learning (RL) method called Scalable-MADDPG is proposed for the first time. In this method, the scale of the multi-USV system can be changed at any time without interrupting the training process. Then, to mitigate the policy oscillation after applying Scalable-MADDPG, a bi-directional long-short-term memory (Bi-LSTM) network is constructed. Moreover, an improved -greedy strategy is proposed to help balance the exploration and exploitation in RL. Furthermore, to enhance the robustness of the optimal policy, Ornstein-Uhlenbeck (OU) noise is added in this improved -greedy strategy during the training process. Finally, the scalable RL method is used to help the multi-USV system perform cooperative target invasion under complex marine environments. The effectiveness of Scalable-MADDPG is demonstrated through three experiments.
reinforcement learning, 4007 Control engineering, mechatronics and robotics, multiple unmanned surface vehicle, Institute for Sustainable Industries and Liveable Cities, long–short-term memory, ϵ-greedy strategy, robustness, scalability
reinforcement learning, 4007 Control engineering, mechatronics and robotics, multiple unmanned surface vehicle, Institute for Sustainable Industries and Liveable Cities, long–short-term memory, ϵ-greedy strategy, robustness, scalability
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 8 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
