
In this paper, we present a new adaptive dynamic programming (ADP) scheme to solve the optimal control problem of multi-player systems with unknown dynamics from the perspective of nonzero-sum (NZS) games. In the presented scheme, a new iterative equation is given. On the basis of the given iterative equation, the control policy and corresponding value function for each player can be learned by using the state and input data, which does not need to identify the system dynamics. To overcome the difficulty of unknown system dynamics, neural network (NN)-based function approximation techniques are employed in the implementation. Based on the given iterative equation and NN-based function approximation techniques, a new non-model-based ADP algorithm is developed. The convergence of the developed non-model-based ADP algorithm is rigorously analyzed and proved. Finally, two numerical simulation examples are provided to demonstrate the performance of the developed non-model-based ADP algorithm.
nonzero-sum (NZS) games, neural network (NN), Electrical engineering. Electronics. Nuclear engineering, multi-player systems, coupled Hamilton-Jacobi (HJ) equations, Adaptive dynamic programming, TK1-9971
nonzero-sum (NZS) games, neural network (NN), Electrical engineering. Electronics. Nuclear engineering, multi-player systems, coupled Hamilton-Jacobi (HJ) equations, Adaptive dynamic programming, TK1-9971
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
