Model Minification: A Taylor-Ridge Framework for Structured Compression

As Large Language Models (LLMs) scale, the deployment cost on commodity hardware becomes prohibitive. While unstructured pruning offers theoretical compression, it often requires specialized kernels to realize speedups. We propose a robust Structured Minification framework that physically reduces the intermediate dimensions of Transformer MLPs, ensuring compatibility with standard GEMM operations. Our methodology combines (1) a global Taylor-First-Order sensitivity analysis to identify redundant feature dimensions, and (2) a closed-form Ridge Regression reconstruction to optimally heal the output distribution of the pruned layers.  <div> We investigate the efficacy of this approach across model scales, applying it to a parameter-dense 135M model and a 1.7B model. Our results demonstrate that minification is highly effective even for smaller, dense models at high retention rates: the 135M model retains significant coherence at 90% retention (Perplexity 4.33 → 4.89). Furthermore, we observe a strong scaling law: the 1.7B model exhibits remarkable robustness, tolerating 30% structural removal with only minor degradation (Perplexity 3.16 → 4.09). This suggests that while smaller models require conservative minification (80-90% retention), larger over-parameterized models possess a highly compressible subspace recoverable via linear leastsquares. </div> <div> Furthermore, because our framework reduces model topology without altering weight precision, it remains strictly orthogonal to quantization, enabling composite compression pipelines that leverage both structural minification and subsequent bit-width reduction.  </div> <div> The code is available at https://github.com/VladimerKhasia/minisp </div>

Related Organizations

Ilia State University
Georgia

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green