
We document prior substitution, a systematic failure mode in which language models, when unable to apply a categorical rule, default to the most training-frequent answer rather than failing randomly. Using MendEval, a 240-problem benchmark spanning genetics, organic chemistry, stoichiometry, and Bayesian inference with contamination-controlled fictional entities, we show that wrong-answer clustering concentrates on specific high-frequency answers in categorical domains (up to 100% of errors on a single answer) but scatters uniformly in continuous domains. This pattern persists across five models from Mistral-7B to GPT-4o, with larger models defaulting to different but equally dominant priors — learning better shortcuts rather than better reasoning. We introduce the Rule Utilization Score (RUS), a mechanistic probe based on sequence log-probability under rule-swapped and rule-scrambled prompts, and show that problems where the model ignores the rule (low RUS) are significantly more likely to trigger prior substitution (p = 0.001). Crucially, chain-of-thought prompting does not eliminate the effect: GPT-4o with step-by-step reasoning still defaults to a single mechanism on 100% of its wrong organic chemistry answers. Prior substitution is distinct from random failure, general capability limitations, and dataset artifacts, representing a predictable failure mode tied to the structure of the answer space and the training distribution.
