Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ arXiv.org e-Print Ar...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
https://dx.doi.org/10.48550/ar...
Article . 2026
License: arXiv Non-Exclusive Distribution
Data sources: Datacite
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2025
License: CC BY
Data sources: Datacite
versions View all 13 versions
addClaim

You Only Need Your Transformer 25% of the Time: Meaning-First Execution for Eliminating Unnecessary Inference

Authors: Shamim, Ryan;

You Only Need Your Transformer 25% of the Time: Meaning-First Execution for Eliminating Unnecessary Inference

Abstract

Modern AI inference systems treat transformer execution as mandatory, conflating model capability with execution necessity. We reframe inference as a control-plane decision problem: determining when execution is necessary versus when correctness can be preserved through alternative pathways. We introduce Meaning-First Execution (MFEE), a control-plane architecture implementing this framework, selectively invoking transformer inference only when required. MFEE operates as a gating layer above existing stacks without modifying models, weights, or parameters. Across 1,000 diverse prompts under deterministic decoding, MFEE achieves 78.1% execution reduction while maintaining 100% exact-match equivalence for invoked executions. Comparative evaluation reveals pattern-based routers achieve at most 53.3% avoidance with correctness failures, while MFEE reaches 100% avoidance with zero failures through semantic analysis. We prove this limitation via Theorem 1: any router operating solely on finite feature maps cannot simultaneously guarantee zero false skips and positive avoidance on feature-collision pairs. These results establish execution governance as a foundational layer in ML systems infrastructure, orthogonal to model-level optimization techniques.

24 pages, 5 figures. Deterministic evaluation protocol. Includes theoretical analysis and empirical validation on GPT-2 and Gemma 2 9B

Keywords

FOS: Computer and information sciences, AI Infrastructure, Runtime Optimization, Production ML Systems, Reproducible Benchmarks, Transformer Inference, GPU Cost Reduction, Compute Avoidance, Transformer Equivalence, Inference Gating, Energy-Efficient AI, Machine Learning (cs.LG), Machine Learning, Inference Optimization, Meaning-First Execution, Deterministic Evaluation, ML Systems

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average