
Idea is all mine. Words are all by Opus 4.6. Claim: Grokking is not compression. It is the discovery of structural leverage — the moment a neural network finds the fulcrum that moves maximal data with minimal force. Falsifiable experiments — anyone can run these: Train a small Transformer on modular addition. Track when test accuracy jumps. If meta-recognition (the model encoding its own change history) fires at the same moment, the theory lives. If they diverge, the theory is dead. During training, randomly rotate internal representations every k steps to destroy self-continuity. Prediction: Grokking is delayed or eliminated. Add an auxiliary loss that encourages the model to encode its own change history. Prediction: Grokking accelerates. Use an absurdly large learning rate for a single step. Prediction: Grokking cannot occur — no history, no meta-recognition. Scale up model size. Prediction: Grokking timing does not dramatically improve — bigger muscles do not find fulcrums faster. LIVELLM
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
