Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

Beyond Scaling: A Stage 3 Geometric Framework for LLM Transparency through Language Manifold Dynamics

Authors: Zhang, Lijia;

Beyond Scaling: A Stage 3 Geometric Framework for LLM Transparency through Language Manifold Dynamics

Abstract

This paper proposes a Stage 3 theoretical framework for understanding LLMs through geometryand mathematical physics. Starting from a vocabulary embedding matrix E ∈ RN×d, the paperidentifies an intrinsic token semantic space Rr, where r represents the effective semantic rank ofthe embedding representation. By adding the token sequence dimension as a temporal coordinate,this space is extended to a temporal-semantic ambient space Rr+1. Observed language is thentreated as discrete token samples or trajectories, while the underlying structure of language ismodelled as a continuous manifold M⊂ Rr+1.A scalar semantic potential Φ is introduced on the language manifold, and the token semanticvector is modelled as v = ∇Φ. This connects token representation with semantic dynamics. Thediffusion equation provides a natural first candidate for fitting a continuous manifold to discretelinguistic samples, while wave and transport equations capture semantic propagation, structurepreservation, and directional movement under contextual constraints. Together, these equationsform a PDE-based framework for modelling language dynamics on the language manifold.Training is interpreted as an inverse problem: estimating the manifold, the scalar potentialstructure, and the coefficient fields of the governing PDE from human-generated language.Inference is interpreted as the forward problem: a prompt imposes boundary or initial conditionsand selects a continuation trajectory on the learned manifold. The framework offers a pathfrom statistical pattern recognition toward a predictive theory of language dynamics grounded inmanifold geometry and PDEs.

Powered by OpenAIRE graph
Found an issue? Give us feedback