Shaping Multi-Agent Systems with Gradient Reinforcement Learning

Article English OPEN
Buffet, Olivier; Dutech, Alain; Charpillet, François;
(2007)
  • Publisher: Springer Verlag
  • Related identifiers: doi: 10.1007/s10458-006-9010-5
  • Subject: Multi-Agent Systems | ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.6: Learning | [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] | ACM : I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.11: Distributed Artificial Intelligence/I.2.11.3: Multiagent systems | ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE | Shaping | Reinforcement Learning | ACM : I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.6: Learning | Partially Observable Markov Decision Processes | ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.11: Distributed Artificial Intelligence/I.2.11.3: Multiagent systems | Policy-Gradient | [ INFO.INFO-AI ] Computer Science [cs]/Artificial Intelligence [cs.AI] | ACM : I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE

The original publication is available at www.springerlink.com; International audience; An original Reinforcement Learning (RL) methodology is proposed for the design of multi-agent systems. In the realistic setting of situated agents with local perception, the task of a... View more
  • References (49)
    49 references, page 1 of 5

    Baxter, J., P. Bartlett, and L. Weaver : 2001, 'Experiments with Infinite-Horizon, PolicyGradient Estimation'. Journal of Artificial Intelligence Research 15, 351-381.

    Bernstein, D., R. Givan, N. Immerman, and S. Zilberstein : 2002, 'The Complexity of Decentralized Control of Markov Decision Processes'. Mathematics of Operations Research 27(4), 819-840.

    Bertsekas, D. and J. Tsitsiklis : 1996, Neurodynamic Programming. Athena Scientific.

    Boutilier, C. : 1996, 'Planning, Learning and Coordination in Multiagent Decision Processes'. In : Y. Shoham (ed.) : Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge (TARK '96). pp. 195-210.

    Buffet, O. : 2003, 'Une double approche modulaire de l'apprentissage par renforcement pour des agents intelligents adaptatifs'. Ph.D. thesis, Université Henri Poincaré, Nancy 1. Laboratoire Lorrain de recherche en informatique et ses applications (LORIA).

    Buffet, O. and D. Aberdeen : 2006, 'The Factored Policy Gradient planner (IPC-06 Version)'. In : A. Gerevini, B. Bonet, and B. Givan (eds.) : Proceedings of the Fifth International Planning Competition (IPC-5). pp. 69-71. [Winner, probabilistic track of the 5th International Planning Competition].

    Buffet, O., A. Dutech, and F. Charpillet : 2004, 'Self-Growth of Basic Behaviors in an Action Selection Based Agent'. In : S. Schaal, A. Ijspeert, A. Billard, S. Vijayakumar, J. Hallam, and J.-A. Meyer (eds.) : From Animals to Animats 8 : Proceedings of the Eighth International Conference on Simulation of Adaptive Behavior (SAB'04). pp. 223-232.

    Buffet, O., A. Dutech, and F. Charpillet : 2005, 'Développement autonome des comportements de base d'un agent'. Revue d'Intelligence Artificielle 19(4-5), 603-632.

    Carmel, D. and S. Markovitch : 1996, Adaption And Learning In Multi-Agent Systems, Vol. 1042 of Lecture Notes in Artificial Intelligence, Chapt. Opponent Modeling in Multi-agent Systems, pp. 40-52. Springer-Verlag.

    Cassandra, A. R. : 1998, 'Exact and Approximate Algorithms for Partially Observable Markov Decision Processes'. Ph.D. thesis, Brown University, Department of Computer Science, Providence, RI.

  • Metrics
Share - Bookmark