Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

Name: Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
Creator: Jineng Ren
Keywords: Multi-agent actor-critic algorithm, 0209 industrial biotechnology, Electronic computers. Computer science, 0202 electrical engineering, electronic engineering, information engineering, Distributed reinforcement learning, Gradient temporal difference, Off policy, QA75.5-76.95, 02 engineering and technology

descriptionPublicationkeyboard_double_arrow_right Article 24 Jun 2024 English Publisher:Springer Science and Business Media LLCJournal:International Journal of Computational Intelligence Systems, volume 17 (eissn: 1875-6883,

Authors: Jineng Ren;

doi: 10.1007/s44196-024-00560-2

Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

- Summary
- Subjects
- Metrics

Abstract

AbstractThis paper proposes a gradient-based multi-agent actor-critic algorithm for off-policy reinforcement learning using importance sampling. Our algorithm is incremental with full gradients, and its complexity per iteration scales linearly with the size of approximation features. Previous multi-agent actor-critic algorithms are limited to the on-policy setting or off-policy emphatic temporal difference (TD) learning and they do not take advantage of the advances in off-policy gradient temporal difference learning (GTD). As a theoretical contribution, we establish that the critic step of the proposed algorithm converges to the TD solution of the projected Bellman equation and the actor step converges to the set of asymptotically stable fixed points. Numerical experiments on the multi-agent generalization of the Boyan’s chain problem show that the proposed approach provides improved performances in terms of stability and convergence rate as compared with the state-of-the-art baseline algorithm.

Related Organizations

Wenzhou University
China (People's Republic of)

Keywords

Multi-agent actor-critic algorithm, Electronic computers. Computer science, Distributed reinforcement learning, Gradient temporal difference, Off policy, QA75.5-76.95

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

gold

Fields of Science (3) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all