First Documented Proof of Cross-Vendor AI Collaboration on a Benchmark: Multi-AI Consensus Achieves Best-Ever 50% on IMO 2025

Name: First Documented Proof of Cross-Vendor AI Collaboration on a Benchmark: Multi-AI Consensus Achieves Best-Ever 50% on IMO 2025
Creator: Kawa, Steven

Kawa, Steven

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Other literature type . 2025

License: CC BY

Data sources: Datacite

ZENODO

Other literature type . 2025

License: CC BY

Data sources: Datacite

ZENODO

Other literature type . 2025

License: CC BY

Data sources: Datacite

First Documented Proof of Cross-Vendor AI Collaboration on a Benchmark: Multi-AI Consensus Achieves Best-Ever 50% on IMO 2025

descriptionPublicationkeyboard_double_arrow_right Other literature type 11 Dec 2025 English Publisher:Zenodo

Authors: Kawa, Steven;

doi: 10.5281/zenodo.17903603 , 10.5281/zenodo.17903905 , 10.5281/zenodo.17903602

First Documented Proof of Cross-Vendor AI Collaboration on a Benchmark: Multi-AI Consensus Achieves Best-Ever 50% on IMO 2025

- Summary
- Related research
  (1)
- Metrics

Abstract

First documented instance of multiple frontier AI systems from different vendors (Claude, GPT-4, Grok, Gemini, DeepSeek, Kimi) collaborating in real-time to solve mathematical olympiad problems. Achieved 50% accuracy (3/6 problems correct) on IMO 2025, an 18.4 percentage point improvement over Gemini baseline. Notably, Gemini alone solved 0/6 problems in our trials, with all three correct answers emerging from cross-AI collaboration and consensus voting. This work demonstrates that multi-vendor AI collaboration can exceed individual model performance on the hardest mathematical reasoning benchmarks, and introduces a novel "Family Game Night" protocol for fallback reasoning when primary models fail.

1 Research products, page 1 of 1

The Fatigue Horizon: Why Living Superintelligent AI Needs to Rest
2026IsSupplementTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Related to Research communities

Knowmad Institut

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now

First Documented Proof of Cross-Vendor AI Collaboration on a Benchmark: Multi-AI Consensus Achieves Best-Ever 50% on IMO 2025

First Documented Proof of Cross-Vendor AI Collaboration on a Benchmark: Multi-AI Consensus Achieves Best-Ever 50% on IMO 2025

1 Research products, page 1 of 1

The Fatigue Horizon: Why Living Superintelligent AI Needs to Rest