Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

Epistemic Twins: Enabling a Symbolic Science of Language Model Knowledge

Authors: Razniewski, Simon; Ghosh, Shrestha; Giordano, Luca; Hu, Yujia; Kowalzik, Josua; Nguyen, Tuan-Phong;

Epistemic Twins: Enabling a Symbolic Science of Language Model Knowledge

Abstract

Large Language Models (LLMs) are impactful yet opaque artifacts. At their core, they are subsymbolic constructs defined by billions of numeric weights that interact in a largely inscrutable manner. Current analysis paradigms are either black-box benchmarks that test model performance on pre-defined tasks, or mechanistic interpretability approaches that trace back outputs to specific weights.Both analysis methods are limited by the experimenter's hypothesis space - one must know what to look for to find it. In this perspective, we argue for a third, radically different analysis paradigm: Epistemic Twins. We propose constructing large-scale symbolic approximations of LLMs in human-readable formats. This enables the comprehensive materialization of factual knowledge (or beliefs) inherent in the model without predefining hypotheses, facilitating large-scale analysis and auditing towards better understanding and explainability.

Powered by OpenAIRE graph
Found an issue? Give us feedback