Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

Modality Matters: A Transient Behavioral Interruption Rescues Agent WANDERING Where Residual Steering Does Not

Authors: Vicentino, Caio;

Modality Matters: A Transient Behavioral Interruption Rescues Agent WANDERING Where Residual Steering Does Not

Abstract

On the same 20 WANDERING Qwen3.6-27B SWE-bench Pro trajectories where residual steering fails three times, a transient behavioral interruption -- one fresh user turn at a live tool-entropy collapse point -- roughly doubles the rate at which agents finalize (30% -> 70%, paired McNemar p=0.021), while a residual L11 injection stays inert (p=0.63). The lever is the interruption itself, not its content: a content-neutral message rescues as well as a re-plan (p=1.0). SWE-bench Pro Docker evaluation indicates the rescued finalizations are real fixes and suggests the interruption also raises solve-rate (~23% -> 50%, cross-session, p=0.062). For long-horizon agents the predictive signal lives in the residual stream but the causal lever lives in behavior. Completes a four-paper arc (detect -> localize -> residual fails -> behavioral works). Companion to Tool-Entropy Collapse (DOI 10.5281/zenodo.20368601).

Powered by OpenAIRE graph
Found an issue? Give us feedback