Name: Consolidated Adaptive T-soft Update for Deep Reinforcement Learning
Creator: Kobayashi, Taisuke
Keywords: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Robotics, 0209 industrial biotechnology, 02 engineering and technology, Robotics (cs.RO), Machine Learning (cs.LG)

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 30 Jun 2024Embargo end date: 01 Jan 2022Publisher:IEEEJournal:2024 International Joint Conference on Neural Networks (IJCNN)

Authors: Kobayashi, Taisuke;

doi: 10.1109/ijcnn60899.2024.10650439 , 10.48550/arxiv.2202.12504

arXiv: 2202.12504

Consolidated Adaptive T-soft Update for Deep Reinforcement Learning

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Demand for deep reinforcement learning (DRL) is gradually increased to enable robots to perform complex tasks, while DRL is known to be unstable. As a technique to stabilize its learning, a target network that slowly and asymptotically matches a main network is widely employed to generate stable pseudo-supervised signals. Recently, T-soft update has been proposed as a noise-robust update rule for the target network and has contributed to improving the DRL performance. However, the noise robustness of T-soft update is specified by a hyperparameter, which should be tuned for each task, and is deteriorated by a simplified implementation. This study develops adaptive T-soft (AT-soft) update by utilizing the update rule in AdaTerm, which has been developed recently. In addition, the concern that the target network does not asymptotically match the main network is mitigated by a new consolidation for bringing the main network back to the target network. This so-called consolidated AT-soft (CAT-soft) update is verified through numerical simulations.

6 pages, 3 figures

Related Organizations

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Robotics, Robotics (cs.RO), Machine Learning (cs.LG)

3 Research products, page 1 of 1

Applications of Double Framed T-Soft Fuzzy Sets in BCK/BCI-Algebras
2018IsAmongTopNSimilarDocuments
T-SOFT - AN ADDITIONAL OPTIMAL TOUR PLAN FORMATION METHOD TO TRAVELING SALESMAN PROBLEMS
2022IsAmongTopNSimilarDocuments
Analýza spokojenosti
2009IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average