
The key-value (KV) cache has become a primary memory bottleneck in long-context large language model (LLM) inference, prompting a wave of compression and eviction strategies. Separately, cognitive and neuroscientific frameworks—including associative memory models and working-memory analogies—have been invoked to interpret how LLMs store and retrieve factual knowledge and how they reason. This paper synthesizes these two lines of work, not to claim that engineering solutions were derived from cognitive science, but to identify where the structural parallels are defensible, where they break down, and what questions the juxtaposition raises. We examine four concretely documented systems: TrimKV/DBTrimKV (XKV), IceCache, Reasoning in Memory (RiM), and Memory-Keyed Attention (MKA), alongside associative-memory and Hopfieldian analyses of LLM fact-learning. We argue that (1) globally calibrated KV eviction implements a form of utility-ranked selective retention whose functional role is analogous to—but not derived from—working memory consolidation; (2) RiM's decoupling of internal computation from token generation mirrors the working-memory principle of not externalizing every internal state, and this decoupling also reduces KV cache pressure as a secondary effect; and (3) hallucination is attributable to learning-phase failures rather than to KV eviction, despite surface similarity to source-monitoring errors. All claims are hedged to what the cited preprints directly support.Authorship: Saluca Agentic AI Research Team (Saluca LLC). AI-drafted from arXiv preprint corpus on the date in the filename.Cited arXiv preprints: 2605.30343v1
