Grokking Beyond Addition: Circuit-Level Analysis of Algebraic Learning in Transformers

This paper investigates the phenomenon of grokking in transformers across a broader class of algebraic structures beyond modular addition. Prior mechanistic interpretability work has shown that transformers trained on modular addition learn Fourier-based clock circuits and exhibit delayed generalisation (grokking). We extend this analysis to eight algebraic operations spanning abelian groups, a composite ring, and non-abelian groups (S3, D5, A4, S4), using 1-layer transformers at d_model = 64. Our key findings are: 1. A clear abelian vs non-abelian grokking boundary: all abelian operations achieve 100% test accuracy, while non-abelian groups fail to generalise despite perfect training accuracy.2. Discrete-log re-indexing improves Fourier concentration for modular multiplication (2.14×), supporting the discrete logarithm representation hypothesis.3. Non-abelian models exhibit partial circuit formation via Peter–Weyl decomposition even without grokking.4. Cross-operation embedding similarity (CKA ≥ 0.80 across all pairs) suggests a shared representational substrate.5. A capacity-dependent interpretation: abelian tasks rely on 1D irreducible representations, while non-abelian tasks require higher-dimensional irreps exceeding model capacity at d_model = 64. All experiments are reproducible via provided code and checkpoint-resume pipelines, runnable on a free Colab T4 GPU (~3 hours). This work contributes new empirical evidence toward understanding the role of algebraic structure and representation theory in neural network generalisation. Code repository: https://github.com/justbytecode/grokking-beyond-addition

Keywords

mechanistic interpretability, representation learning, grokking, group theory, non-abelian groups, transformers, deep learning theory, Fourier analysis

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now