
doi: 10.3390/act11040099
In a multi-agent system, the complex interaction among agents is one of the difficulties in making the optimal decision. This paper proposes a new action value function and a learning mechanism based on the optimal equivalent action of the neighborhood (OEAN) of a multi-agent system, in order to obtain the optimal decision from the agents. In the new Q-value function, the OEAN is used to depict the equivalent interaction between the current agent and the others. To deal with the non-stationary environment when agents act, the OEAN of the current agent is inferred simultaneously by the maximum a posteriori based on the hidden Markov random field model. The convergence property of the proposed methodology proved that the Q-value function can approach the global Nash equilibrium value using the iteration mechanism. The effectiveness of the method is verified by the case study of the top-coal caving. The experiment results show that the OEAN can reduce the complexity of the agents’ interaction description, meanwhile, the top-coal caving performance can be improved significantly.
TK1001-1841, Production of electric energy or power. Powerplants. Central stations, 330, multi-agent reinforcement learning, multi-agent reinforcement learning; optimal decision; hidden Markov random field; top-coal caving, TA401-492, hidden Markov random field, top-coal caving, Materials of engineering and construction. Mechanics of materials, optimal decision, 620
TK1001-1841, Production of electric energy or power. Powerplants. Central stations, 330, multi-agent reinforcement learning, multi-agent reinforcement learning; optimal decision; hidden Markov random field; top-coal caving, TA401-492, hidden Markov random field, top-coal caving, Materials of engineering and construction. Mechanics of materials, optimal decision, 620
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
