Oryx: a Scalable Sequence Model for Many-Agent Coordination in Offline MARL

Name: Oryx: a Scalable Sequence Model for Many-Agent Coordination in Offline MARL
Keywords: Machine Learning, FOS: Computer and information sciences, Machine Learning (cs.LG)

Formanek, Claude; Mahjoub, Omayma; Nessir, Louay Ben; Abramowitz, Sasha; de Kock, Ruan; Khlifi, Wiem; Rajaonarivonivelomanantsoa, Daniel; Toit, Simon Du; Fokam, Arnol; Singh, Siddarth; Sob, Ulrich Mbou; Chalumeau, Felix; Pretorius, Arnu

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

https://dx.doi.org/10.48550/ar...

Article . 2025

License: CC BY

Data sources: Datacite

Oryx: a Scalable Sequence Model for Many-Agent Coordination in Offline MARL

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2025Embargo end date: 01 Jan 2025Publisher:arXiv

Authors: Formanek, Claude; Mahjoub, Omayma; Nessir, Louay Ben; Abramowitz, Sasha; de Kock, Ruan; Khlifi, Wiem; Rajaonarivonivelomanantsoa, Daniel; +6 Authors

doi: 10.48550/arxiv.2505.22151

arXiv: 2505.22151

Oryx: a Scalable Sequence Model for Many-Agent Coordination in Offline MARL

- Summary
- Subjects
- Metrics

Abstract

A key challenge in offline multi-agent reinforcement learning (MARL) is achieving effective many-agent multi-step coordination in complex environments. In this work, we propose Oryx, a novel algorithm for offline cooperative MARL to directly address this challenge. Oryx adapts the recently proposed retention-based architecture Sable and combines it with a sequential form of implicit constraint Q-learning (ICQ), to develop a novel offline autoregressive policy update scheme. This allows Oryx to solve complex coordination challenges while maintaining temporal coherence over long trajectories. We evaluate Oryx across a diverse set of benchmarks from prior works -- SMAC, RWARE, and Multi-Agent MuJoCo -- covering tasks of both discrete and continuous control, varying in scale and difficulty. Oryx achieves state-of-the-art performance on more than 80% of the 65 tested datasets, outperforming prior offline MARL methods and demonstrating robust generalisation across domains with many agents and long horizons. Finally, we introduce new datasets to push the limits of many-agent coordination in offline MARL, and demonstrate Oryx's superior ability to scale effectively in such settings.

Published at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

Keywords

Machine Learning, FOS: Computer and information sciences, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green