Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Research
Data sources: ZENODO
addClaim

From Cache to Cognition: A Grounded Cross-Domain Synthesis of KV Cache Management and Cognitive Memory Frameworks in LLM Inference

Authors: Saluca Agentic AI Research Team;

From Cache to Cognition: A Grounded Cross-Domain Synthesis of KV Cache Management and Cognitive Memory Frameworks in LLM Inference

Abstract

The key-value (KV) cache has become a primary memory bottleneck in long-context large language model (LLM) inference, prompting a wave of compression and eviction strategies. Separately, cognitive and neuroscientific frameworks—including associative memory models and working-memory analogies—have been invoked to interpret how LLMs store and retrieve factual knowledge and how they reason. This paper synthesizes these two lines of work, not to claim that engineering solutions were derived from cognitive science, but to identify where the structural parallels are defensible, where they break down, and what questions the juxtaposition raises. We examine four concretely documented systems: TrimKV/DBTrimKV (XKV), IceCache, Reasoning in Memory (RiM), and Memory-Keyed Attention (MKA), alongside associative-memory and Hopfieldian analyses of LLM fact-learning. We argue that (1) globally calibrated KV eviction implements a form of utility-ranked selective retention whose functional role is analogous to—but not derived from—working memory consolidation; (2) RiM's decoupling of internal computation from token generation mirrors the working-memory principle of not externalizing every internal state, and this decoupling also reduces KV cache pressure as a secondary effect; and (3) hallucination is attributable to learning-phase failures rather than to KV eviction, despite surface similarity to source-monitoring errors. All claims are hedged to what the cited preprints directly support.Authorship: Saluca Agentic AI Research Team (Saluca LLC). AI-drafted from arXiv preprint corpus on the date in the filename.Cited arXiv preprints: 2605.30343v1

Powered by OpenAIRE graph
Found an issue? Give us feedback