Geometric Phase Extraction from Transformer Hidden States: Architecture-Dependent Manifold Structure and Adaptive Observation Protocols

Geometric Phase Extraction from Transformer Hidden States What this is Code and data for a paper that asks: do Transformer hidden states have coherent angular (phase-like) structure, and if so, how do you extract it? Short answer: yes, but only if you pick the right method for the right architecture. The problem The standard signal-processing approach to phase extraction — PCA, bandpass filter, Hilbert transform — assumes oscillatory dynamics. Transformers are feedforward, not recurrent. We tested this pipeline on GPT-2 and got R-bar ≈ 0.12, which is indistinguishable from noise. The conventional approach simply doesn't work here. What we found A geometric method works. Project hidden states onto their first two principal components, compute the angle via atan2. On Pre-LayerNorm models (GPT-2, Qwen2, Pythia, most OPT variants), this gives R-bar = 0.93–0.98 — roughly 8x better than Hilbert. LayerNorm placement is the key variable. GPT-1 (Post-LN) and GPT-2 (Pre-LN) have nearly identical architectures (768-dim, 12 layers, ~120M params). The only real difference is where LayerNorm goes. PCA concentration at k=2: 16% vs 96%. That 6x gap is reproducible and shows up again in the OPT family (OPT-350m vs OPT-125m Pre-LN). For low-concentration models, a wide-bandpass Hilbert variant works as a fallback. Passband [0.01, 0.45] instead of the standard [0.05, 0.25]. This gets R-bar = 0.60–0.94 across all nine models we tested, including OPT-1.3B where the geometric method underperforms. You can pick the method automatically. Compute PCA variance explained at k=2 (we call it ρ₂). If ρ₂ > 0.80, use geometric extraction. Otherwise, use wide-bandpass Hilbert. That's the whole protocol. What's in this repository Paper: LaTeX source and compiled PDF (23 pages, arXiv-formatted) 13 experiments (7 core + 6 supplementary), all as standalone Python scripts All generated figures (PNG) and raw data (JSON) for full reproducibility run_all.py — single command to reproduce everything Models tested Nine models, 110M–2.8B parameters: GPT-1, GPT-2, OPT-125m/350m/1.3B/2.7B, Qwen2-0.5B/1.5B, Pythia-2.8B. All downloaded automatically from HuggingFace Hub. Reproducibility python3 -m venv .venv && source .venv/bin/activate pip install -r experiments/requirements.txt python experiments/run_all.py Runs on consumer hardware. Tested on Apple M1, 16GB. Total runtime ~45 minutes. No GPU required. Why it matters For interpretability researchers: Pre-LN hidden states live on a ~2D manifold at middle layers. Angular position on that manifold is a new, unsupervised observable — no labeled data or probes needed. For practitioners: The three-tier architecture classification (Post-LN / Pre-LN OPT / Pre-LN non-OPT) has practical implications for compression and low-rank approximation strategies. For the multi-agent crowd: Phase coherence gives you a scalar, architecture-comparable quantity for monitoring alignment across LLM instances. Related papers This paper provides the theoretical foundation for the Recync framework — runtime coherence control for LLMs: From Monitoring to Intervention (detection + token-level control limits): doi.org/10.5281/zenodo.19148449 Beyond Micro-Control (response-level checkpoint restart breakthrough): doi.org/10.5281/zenodo.19148721 Code This repository: github.com/metaSATOKEN/geometric_phase_extraction Recync framework (Paper 2 & 3): github.com/metaSATOKEN/Recync_framework License Paper content: CC BY 4.0 Code: Apache License 2.0 Copyright 2026 Kentaro Sato.

Keywords

LLM, Transformer, mechanistic interpretability, PCA, Pre-LayerNorm, LayerNorm, Post-LayerNorm, reproducible research, phase extraction, manifold geometry, hidden states, coherence

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now