The Singularity Gate: A Benchmark for AI-Driven Paradigm-Shifting Scientific Discovery

Erkan, Emirhan Sami

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Preprint

Data sources: ZENODO

The Singularity Gate: A Benchmark for AI-Driven Paradigm-Shifting Scientific Discovery

descriptionPublicationkeyboard_double_arrow_right Preprint Under curation English Publisher:Zenodo

Authors: Erkan, Emirhan Sami;

doi: 10.5281/zenodo.20358378

The Singularity Gate: A Benchmark for AI-Driven Paradigm-Shifting Scientific Discovery

- Summary

Abstract

We introduce The Singularity Gate, a benchmark that measures whether frontier AI can predict paradigm-shifting scientific findings published after its training cutoff. At smaller scale, it operationalises the question of whether an AI with only the knowledge Einstein had at the time would have arrived at general relativity. This capability is necessary, though not sufficient, for autonomous AI-driven discovery. Items enter the corpus only after a per-item contamination audit covering corporate press releases, specialty conference talks, preprint servers (Zenodo, OSF, ResearchGate, ResearchSquare), early-online publication ahead of print, bioRxiv author-search gaps, and direct online-date verification. A candidate is admitted only when its first public trace in any of these categories falls strictly after the latest empirically located training cutoff in the panel, the finding is paradigm-breaking, and the published abstract states a specific mechanism, magnitude, and direction. A separate parallel-true audit then confirms that the open-ended prompt admits only this finding, with no adjacent-but-true alternatives. The panel cutoff is re-derived as new respondents arrive, with overtaken items retired and replaced. Respondents are evaluated in each lab's native agentic harness with tool use enabled and web search disabled, on items spanning five broad scientific fields. The top model, at maximum reasoning effort, reaches 17.75%; the fully-correct-outcome rate is 0% across all respondents, so reported scores are entirely partial credit. Current frontier AI does not yet produce genuine discoveries, but it is closing the gap.

Found an issue? Give us feedback