Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint . 2026
License: CC BY
Data sources: ZENODO
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

Object Commitment as a Diagnostic Pressure Point in Grounded Planning

Authors: Shoryavardhaan Gupta;

Object Commitment as a Diagnostic Pressure Point in Grounded Planning

Abstract

Contemporary grounded planning agents commonly delegate object selection to external resolvers—APIs, heuristics, and privileged interfaces—a pattern that masks representational failures. We isolate object commitment as a diagnostic variable: two agents with identical perception, world models, and training differ only in whether object arguments are resolved externally (Variant A) or internally (Variant B). This exposes when mean-pooled representations, standard in model-based RL systems like Dreamer and PlaNet, fail at categorical object grounding. We demonstrate four non-intuitive dissociations. First, planning and grounding are orthogonal: the same model achieves 100% multi-step planning success while failing completely (0%) at object selection on identical tasks when entropy increases from 4 to 30+ objects (Archive Dichotomy). Second, the Data Scale Paradox: homogeneous training data causes performance collapse from 100% (300 trajectories) to 0% (500+ trajectories)—more data actively harms performance through statistical gravity that reinforces dominant patterns over task-conditional reasoning. Third, width-only capacity scaling destroys intelligence: 8M parameters (23× increase) underperform the 343K base model, while balanced width+depth scaling (75M, 220× increase) recovers planning but not grounding, revealing a width-to-depth ratio requirement for capability emergence. Fourth, grounding bottlenecks are architectural, not parametric: even 220× scaling cannot overcome categorical failures under high entropy, confirming mean pooling imposes a representational ceiling. These failure modes—semantic drift under statistical gravity, topology-dependent collapse, and entropy-sensitive grounding—parallel pathologies in large language models (hallucinations, entity confusion), suggesting mechanistic connections between filesystem agents and frontier AI systems. Delegation conceals these failures; internal commitment exposes them. The gap is diagnostic.

Keywords

Machine Learning, object planning, Artificial Intelligence, Supervised Machine Learning/standards, grounded cognition, Computer Science, Agentic AI, non-linguistic agents, Supervised Machine Learning, representational bottlenecks, Machine Learning/standards, agent brittleness, imagination-based planning

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green