
Aham: A Metacognitive Architecture for Latent-Steered Theory-of-Mind in Large Language Models Abstract: Large language models (LLMs) exhibit impressive reasoning behaviors through chain-of-thought (CoT) generation, yet they cannot revise, contextualize, or self-regulatetheir internal reasoning in real time. This limitation prevents adaptive memory usage, Theory of Mind (ToM) sensitivity, and consistent interpersonal behavior across long-term interactions.We introduce Aham, a modular cognitive architecture that combines symbolic meta-reasoningwith subsymbolic latent state intervention. Aham intercepts the model’s internal CoT trace,evaluates it using a meta-reasoning “Arbiter,” and modulates the model’s behavior throughtwo parallel pathways: (1) an explicit rewriting engine that adjusts the reasoning text, and (2)a Latent State Steering (LSS) mechanism that injects ToM-derived vectors directly into themodel’s residual stream.Crucially, Aham implements a Dynamic Residual Injection protocol: the ToM profile isprojected into the model’s hidden dimension and added to the final hidden state before theLanguage modeling head, biasing the token distribution toward personality-consistent outputs without altering the pre-computed KV cache. The system is evaluated on the DeepSeekR1-Distill-Qwen-32B backbone. Preliminary results demonstrate that this hybrid approachproduces more coherent, grounded, and user-adaptive reasoning than text-only modulationalone.
Large Language Models, Latent Steering, Cognitive Architecture, Mechanistic Interpretability, Theory Of Mind
Large Language Models, Latent Steering, Cognitive Architecture, Mechanistic Interpretability, Theory Of Mind
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
