Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

Vorn: Residual Direction, Familial Eviction, and the Granularity Rescue Spectrum

Authors: Penney, Layne;

Vorn: Residual Direction, Familial Eviction, and the Granularity Rescue Spectrum

Abstract

Key-value cache eviction is a major approach to long-context inference, and prior work typically selects a scoring criterion under the implicit assumption that its ranking generalizes across model families. Under a prefill-time cache selection contract with full-prompt visibility, we show this assumption fails: KV-cache eviction effectiveness is family-conditional. Across seven model families spanning four laboratories (Mistral, Llama, Ministral, Gemma 2, Qwen 2.5, Gemma 4, Qwen 3-NT), five families attain or preserve long-context retrieval competence under residual-direction-based eviction, which we call vorn, while two families (Gemma 4 and Qwen 3-NT) are attention-favoring at the shared b=1024 gate. Two targeted multi-budget falsification probes rule out attention-pattern and laboratory identity as sufficient explanations. A complementary granularity rescue spectrum reports a unit-of-retention effect across family-anchored sentence-attention surfaces: retrieval competence under eviction ranges from rescue-resilient (Mistral) through threshold-bounded recovery (Llama 3.1) to rescue-resistant (Gemma 4). The practical implication is that eviction-method selection should be family-conditional on the quality axis, while cost-per-correct is dominated by retention granularity rather than scoring channel. Companion materials: Interactive companion: https://synapt.dev/vorn-mat/ Reproducible code + Docker pins: https://github.com/synapt-dev/vorn-mat HuggingFace dataset: https://huggingface.co/datasets/synapt/vorn-mat-cross-family-results

Powered by OpenAIRE graph
Found an issue? Give us feedback