Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2026
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2026
Data sources: ZENODO
https://doi.org/10.2139/ssrn.6...
Article . 2026 . Peer-reviewed
Data sources: Crossref
ZENODO
Other literature type . 2026
Data sources: Datacite
ZENODO
Other literature type . 2026
Data sources: Datacite
ZENODO
Other literature type . 2026
Data sources: Datacite
ResearchGate Data
Preprint . 2026
Data sources: Datacite
versions View all 5 versions
addClaim

Model Minification: A Taylor-Ridge Framework for Structured Compression

Authors: Khasia, Vladimer;

Model Minification: A Taylor-Ridge Framework for Structured Compression

Abstract

As Large Language Models (LLMs) scale, the deployment cost on commodity hardware becomes prohibitive. While unstructured pruning offers theoretical compression, it often requires specialized kernels to realize speedups. We propose a robust Structured Minification framework that physically reduces the intermediate dimensions of Transformer MLPs, ensuring compatibility with standard GEMM operations. Our methodology combines (1) a global Taylor-First-Order sensitivity analysis to identify redundant feature dimensions, and (2) a closed-form Ridge Regression reconstruction to optimally heal the output distribution of the pruned layers.&nbsp; <div> We investigate the efficacy of this approach across model scales, applying it to a parameter-dense 135M model and a 1.7B model. Our results demonstrate that minification is highly effective even for smaller, dense models at high retention rates: the 135M model retains significant coherence at 90% retention (Perplexity 4.33 → 4.89). Furthermore, we observe a strong scaling law: the 1.7B model exhibits remarkable robustness, tolerating 30% structural removal with only minor degradation (Perplexity 3.16 → 4.09). This suggests that while smaller models require conservative minification (80-90% retention), larger over-parameterized models possess a highly compressible subspace recoverable via linear leastsquares. </div> <div> Furthermore, because our framework reduces model topology without altering weight precision, it remains strictly orthogonal to quantization, enabling composite compression pipelines that leverage both structural minification and subsequent bit-width reduction.&nbsp; </div> <div> The code is available at https://github.com/VladimerKhasia/minisp </div>

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green