
Reverse specification generation—the task of inferring documentation from an existing codebase—remains a persistent challenge across diverse practical domains in software engineering: legacy system modernization, onboarding of new contributors, contractual deliverable preparation, and regulatory compliance. The widespread adoption of large language models (LLMs) has made this task ostensibly automatable, yet artifacts produced by pure-AI approaches fail to reach tacit knowledge that is absent from the code itself, instead introducing fluent but groundless filler that erodes the practical reliability of generated documentation. Conversely, pure-human authorship lacks the exhaustiveness and consistency needed to complete within realistic time bounds. Generic Human-in-the-Loop (HITL) patterns intervene in an ad hoc manner regarding when, why, and how the human contributes, and therefore fail to support the epistemic structure specific to reverse specification. This paper proposes the Phased Co-Construction Methodology for reverse specification generation from existing codebases. The methodology rests on three foundational structures: (i) a formalization that segments codebase knowledge into three layers—explicit / implicit / tacit; (ii) a Responsibility Allocation Triad that explicitly assigns each phase to one of three actors—Human / AI / Mechanical; and (iii) a six-phase state machine spanning reconnaissance, planning, parallel investigation, mechanical verification, dialogue-based refinement, and delivery. We further introduce (iv) a statement-level confidence convention that embeds hallucination control directly into the output protocol itself rather than relegating it to downstream filters, and (v) an Abandoned-as-First-Class principle that delivers permanently unresolvable uncertainties as a first-class chapter of the final artifact. Surveying related work across pure-AI specification synthesis, forward requirements elicitation, generic HITL LLMOps, abstention and calibrated uncertainty, tacit knowledge elicitation, AI-native SDLC, and industrial hybrid approaches, we show that no prior work integrates the above five elements. Implementability is demonstrated through cc-rsg (Claude Code Reverse Spec Generator), an open-source reference implementation realized as a Claude Code skill. A case study on a Tetris codebase confirms 100% inventory coverage, zero cross-chapter inconsistencies, and the structural emergence of abandoned questions (5/6) under SME-inaccessible conditions. This paper is positioned as a methodology paper, and cc-rsg is presented as one instantiation rather than the sole implementation.
