
This repository accompanies the paper "The Redundancy Cliff: Discovering and Exploiting 50% Dispensable Compute in Transformers with a Single O(1) Spectral Gate" (Dec 2025). We introduce a simple, fast $O(1)$ spectral gate $\kappa_x$ derived from a modular-residue fingerprint of the input, then apply $\kappa_x$ to dynamically prune attention KV cache length and slice MLP intermediate dimensions. Experiments on TinyLlama-1.1B (WikiText-2) show substantial compute reduction with negligible change in perplexity in our proof-of-concept (PoC) runs. This release contains the final, tested code (scripts, patching logic, $\kappa_x$ computation, and required dependencies) used to run the experiments, along with the original run artifacts for maximum transparency. This is a proof-of-concept and is not intended for production deployment. See README.md for instructions, environment notes, and full reproducibility details.
redundancy-cliff, spectral-gate, lambda2, transformers, kv-cache-pruning, dynamic-sparsity, mlp-slicing
redundancy-cliff, spectral-gate, lambda2, transformers, kv-cache-pruning, dynamic-sparsity, mlp-slicing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
