
We introduce K-Operators, a kernel-decomposed sequence modeling architecture that replaces attention entirely with structured causal kernel operators. On Tiny Shakespeare character-level modeling, a 1.14M-parameter K-Operators model achieves 4.43 ±0.05 validation perplexity across 7 seeds—approaching the 4.35 PPL of a 10.65M-parameter Transformer baseline (nanoGPT) while using 9.3×fewer parameters and requiring no positional encodings. The architecture decomposes sequence mixing into a hierarchy of operators: K1 layers for position-wise feature mixing, K2 layers for causal sequence interaction via a learned base kernel combined with low-rank gamma-decayed recurrence, and a K0 layer for final rescaling. These are composed into a K-Stack backbone (K1 →K(×N ) 2 →K1 →K0) and refined through a learned iterative equilibrium loop governed by a scalar step-size η. Two interchangeable gamma-decay backends (mask and block) offer different memory/speed trade-offs. Diagnostic analysis reveals interpretable learned dynamics: the model progressively transfers sequence mixing from the initialized base kernel to the adaptive recurrent path, develops per-layer functional specialization, and learns to self-regulate the refinement loop—including robustness to 10×learning rate misspecification via automatic η suppression.
Machine Learning
Machine Learning
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
