Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Research
Data sources: ZENODO
addClaim

Tool-Entropy Collapse: A Cross-Architecture Signature of Agent WANDERING Failure

Authors: Vicentino, Caio;

Tool-Entropy Collapse: A Cross-Architecture Signature of Agent WANDERING Failure

Abstract

We identify a 34% blind spot in probe-based LLM agent failure monitoring on Qwen3.6-27B SWE-bench Pro: the WANDERING sub-class where probe says success but agent never emits finish_tool. We test six detector designs across three signal channels (text, residual cross-layer, action entropy) and find that tool-use entropy collapse is the breakthrough signal—WANDERING agents collapse onto a small set of repeated tool calls (W/S median ratio 0.41 in Qwen and Llama, 0.71 in GPT-5), enabling a Tier-3 autonomous-termination detector at 70% recall × 5% false-positive cost.Cross-architecture validation: Llama-70b (n=2,315, p<10⁻¹⁵, ratio 0.41) and GPT-5 router (n=1,419, p=8.9×10⁻³⁵, ratio 0.71) confirm. Cross-task validation on METR MALT is NULL (p=0.81), scoping the claim to multi-turn code-execution agent tasks with rich action spaces.The paper provides a three-tier deployment framework (forensics / advisory escalation / autonomous termination), all shippable. Mid-layer ablation suggests edge-layer (L11, L55) involvement in the cross-layer disagreement signal, but we hedge between edge-specificity vs layer-count interpretations.Reproducibility: all code, per-trajectory output JSONs, and figure-generation scripts at GitHub under Apache-2.0. OpenInterp Phase 6 dataset (99 trajectories × per-turn residuals at L11/L23/L31/L43/L55 in bf16 safetensors) will be released at HuggingFace upon paper acceptance.

Powered by OpenAIRE graph
Found an issue? Give us feedback