Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Grokking Beyond Addition: Circuit-Level Analysis of Algebraic Learning in Transformers

Authors: Pal, Mani;

Grokking Beyond Addition: Circuit-Level Analysis of Algebraic Learning in Transformers

Abstract

This paper investigates the phenomenon of grokking in transformers across a broader class of algebraic structures beyond modular addition. Prior mechanistic interpretability work has shown that transformers trained on modular addition learn Fourier-based clock circuits and exhibit delayed generalisation (grokking). We extend this analysis to eight algebraic operations spanning abelian groups, a composite ring, and non-abelian groups (S3, D5, A4, S4), using 1-layer transformers at d_model = 64. Our key findings are: 1. A clear abelian vs non-abelian grokking boundary: all abelian operations achieve 100% test accuracy, while non-abelian groups fail to generalise despite perfect training accuracy.2. Discrete-log re-indexing improves Fourier concentration for modular multiplication (2.14×), supporting the discrete logarithm representation hypothesis.3. Non-abelian models exhibit partial circuit formation via Peter–Weyl decomposition even without grokking.4. Cross-operation embedding similarity (CKA ≥ 0.80 across all pairs) suggests a shared representational substrate.5. A capacity-dependent interpretation: abelian tasks rely on 1D irreducible representations, while non-abelian tasks require higher-dimensional irreps exceeding model capacity at d_model = 64. All experiments are reproducible via provided code and checkpoint-resume pipelines, runnable on a free Colab T4 GPU (~3 hours). This work contributes new empirical evidence toward understanding the role of algebraic structure and representation theory in neural network generalisation. Code repository: https://github.com/justbytecode/grokking-beyond-addition

Keywords

mechanistic interpretability, representation learning, grokking, group theory, non-abelian groups, transformers, deep learning theory, Fourier analysis

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!