
Large Language Models (LLMs) are impactful yet opaque artifacts. At their core, they are subsymbolic constructs defined by billions of numeric weights that interact in a largely inscrutable manner. Current analysis paradigms are either black-box benchmarks that test model performance on pre-defined tasks, or mechanistic interpretability approaches that trace back outputs to specific weights.Both analysis methods are limited by the experimenter's hypothesis space - one must know what to look for to find it. In this perspective, we argue for a third, radically different analysis paradigm: Epistemic Twins. We propose constructing large-scale symbolic approximations of LLMs in human-readable formats. This enables the comprehensive materialization of factual knowledge (or beliefs) inherent in the model without predefining hypotheses, facilitating large-scale analysis and auditing towards better understanding and explainability.
