
This paper examines a difficult boundary case: the construction of a five-paper theoretical-physics programme by a non-physicist human researcher working with frontier commercial large language models. Its central claim is intentionally narrow: human–LLM collaboration may scale for programme construction before it scales for field-shifting discovery. The evidence base is public but incomplete. Five Zenodo papers from the GDE emergent-gravity project are treated as the observable corpus; session-level logs, model-specific task ledgers, and experimental counterfactual arms are not available. The study is therefore analytic rather than forensic: it asks whether the collaboration generated a non-trivial, falsifiable, internally coherent research programme whose human and machine roles can be functionally decomposed. The argument advanced here is that the human contribution lay chiefly in programme architecture — question selection, scope boundaries, hard null design, benchmark choice, and preservation of the core/derived split — whereas the LLM contribution lay in formal translation, derivational expansion, textual synthesis, and local coherence maintenance. The resulting claim is modest and testable: the case supports AI-assisted programme construction in a deep technical domain, but it does not yet support a general claim of autonomous scientific discovery.
