
Geometric Phase Extraction from Transformer Hidden States What this is Code and data for a paper that asks: do Transformer hidden states have coherent angular (phase-like) structure, and if so, how do you extract it? Short answer: yes, but only if you pick the right method for the right architecture. The problem The standard signal-processing approach to phase extraction — PCA, bandpass filter, Hilbert transform — assumes oscillatory dynamics. Transformers are feedforward, not recurrent. We tested this pipeline on GPT-2 and got R-bar ≈ 0.12, which is indistinguishable from noise. The conventional approach simply doesn't work here. What we found A geometric method works. Project hidden states onto their first two principal components, compute the angle via atan2. On Pre-LayerNorm models (GPT-2, Qwen2, Pythia, most OPT variants), this gives R-bar = 0.93–0.98 — roughly 8x better than Hilbert. LayerNorm placement is the key variable. GPT-1 (Post-LN) and GPT-2 (Pre-LN) have nearly identical architectures (768-dim, 12 layers, ~120M params). The only real difference is where LayerNorm goes. PCA concentration at k=2: 16% vs 96%. That 6x gap is reproducible and shows up again in the OPT family (OPT-350m vs OPT-125m Pre-LN). For low-concentration models, a wide-bandpass Hilbert variant works as a fallback. Passband [0.01, 0.45] instead of the standard [0.05, 0.25]. This gets R-bar = 0.60–0.94 across all nine models we tested, including OPT-1.3B where the geometric method underperforms. You can pick the method automatically. Compute PCA variance explained at k=2 (we call it ρ₂). If ρ₂ > 0.80, use geometric extraction. Otherwise, use wide-bandpass Hilbert. That's the whole protocol. What's in this repository Paper: LaTeX source and compiled PDF (23 pages, arXiv-formatted) 13 experiments (7 core + 6 supplementary), all as standalone Python scripts All generated figures (PNG) and raw data (JSON) for full reproducibility run_all.py — single command to reproduce everything Models tested Nine models, 110M–2.8B parameters: GPT-1, GPT-2, OPT-125m/350m/1.3B/2.7B, Qwen2-0.5B/1.5B, Pythia-2.8B. All downloaded automatically from HuggingFace Hub. Reproducibility python3 -m venv .venv && source .venv/bin/activate pip install -r experiments/requirements.txt python experiments/run_all.py Runs on consumer hardware. Tested on Apple M1, 16GB. Total runtime ~45 minutes. No GPU required. Why it matters For interpretability researchers: Pre-LN hidden states live on a ~2D manifold at middle layers. Angular position on that manifold is a new, unsupervised observable — no labeled data or probes needed. For practitioners: The three-tier architecture classification (Post-LN / Pre-LN OPT / Pre-LN non-OPT) has practical implications for compression and low-rank approximation strategies. For the multi-agent crowd: Phase coherence gives you a scalar, architecture-comparable quantity for monitoring alignment across LLM instances. Related papers This paper provides the theoretical foundation for the Recync framework — runtime coherence control for LLMs: From Monitoring to Intervention (detection + token-level control limits): doi.org/10.5281/zenodo.19148449 Beyond Micro-Control (response-level checkpoint restart breakthrough): doi.org/10.5281/zenodo.19148721 Code This repository: github.com/metaSATOKEN/geometric_phase_extraction Recync framework (Paper 2 & 3): github.com/metaSATOKEN/Recync_framework License Paper content: CC BY 4.0 Code: Apache License 2.0 Copyright 2026 Kentaro Sato.
LLM, Transformer, mechanistic interpretability, PCA, Pre-LayerNorm, LayerNorm, Post-LayerNorm, reproducible research, phase extraction, manifold geometry, hidden states, coherence
LLM, Transformer, mechanistic interpretability, PCA, Pre-LayerNorm, LayerNorm, Post-LayerNorm, reproducible research, phase extraction, manifold geometry, hidden states, coherence
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
