
We present a consolidated empirical study of constructive geometric modifications to Transformer sequence models, spanning exact symplectic residualization, exact linear sym- plectic token mixers, geometry-driven attention kernels, and mixed probabilistic–geometric hybrids. Across this program, we ask three linked questions: (i) can exact cross-token symplectic operators be constructed and verified numerically, (ii) can geometry alone support competitive sequence modeling, and (iii) when geometry helps, does it do so as a replacement for attention or as a controlled augmentation of attention? Our results support four main conclusions. First, exact symplectic cross-token operatorscan be constructed: a Hamiltonian-matrix-exponential token mixer achieves machine-precision symplectic diagnostics on the token-mixing component. Second, exact or geometry-pure architectures remain strongly under-expressive on the tested prefix phase-tracking task; they typically stay near random-guess accuracy despite clean structural diagnostics. Third, hybrid models that mix a standard causal-attention branch with a geometry-biased branch substantially outperform a single standard Transformer. In particular, a mixed standard- plus-geometry-informed attention model reaches 62.7% validation accuracy versus 50.2% for its matched standard baseline. Fourth, a parameter-matched double-standard mixed control reaches 66.1%, exceeding the geometry-mixed model, which shows that the largest gain is better explained by dual-branch capacity and mixture structure than by a geometry-specific inductive bias alone. The resulting picture is sharper than a simple “geometry helps” slogan. Exact geometry is constructively achievable but under-expressive; pure geometry-informed probability also fails; the strongest current empirical regime is a mixed one in which geometry acts as a structured bias inside a stronger probabilistic communication backbone. We interpret this as evidence that the present value of geometric structure in sequence models lies in augmentation rather than replacement, while a deeper Kähler-style unification of probability and geometry remains an open long-horizon target rather than a result established here.
CDIP, long-context extrapolation, Transformer dynamics, symplectic residuals, geometry- informed attention, Kähler-inspired attention, structure–accuracy tradeoff, exact symplectic token mixer
CDIP, long-context extrapolation, Transformer dynamics, symplectic residuals, geometry- informed attention, Kähler-inspired attention, structure–accuracy tradeoff, exact symplectic token mixer
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
