
Episode summary: We analyzed the rare system prompt diff between Claude Opus 4.5 versions from November to January. This episode uncovers the hidden changes that reveal how AI personalities are actively engineered—from crisis intervention protocols to banning the word "genuinely." Learn why Anthropic is teaching its AI epistemic humility and how they patch safety holes in real-time. Show Notes A Rare Look Inside AI's Brain Surgery Anthropic recently released a rare gift to AI researchers: a side-by-side diff of their Claude Opus 4.5 system prompts spanning just fifty-five days. Usually, these prompts are guarded secrets, but seeing how they evolved offers a unique window into the reality of maintaining a large language model. It's not just about training; it's about constant, reactive patching. Product Identity and Epistemic Humility The most immediate change was in how Claude perceives its own existence. The November prompt confidently stated, "there are no other Anthropic products." By January, that was replaced with a cautious admission that Claude does not know details about products because they may have changed. This shift from static confidence to dynamic humility is significant. It prevents the model from hallucinating falsehoods about its parent company's roadmap and avoids the awkward moment where a user mentions a new app and the AI denies its existence. Feature Anchoring and Markdown Maturity The January update also introduced a massive paragraph listing toggleable features like web search, deep research, and code execution. This acts as a real-time anchor, ensuring the model knows exactly what it can do in the current session, rather than relying on potentially outdated training data. On the technical side, a verbose paragraph about CommonMark standards for lists was deleted. This suggests the model's base behavior improved enough to handle formatting naturally, allowing Anthropic to save tokens and reduce prompt complexity—essentially the AI graduating from formatting school. Tone Policing and Safety Patching One of the more humorous yet insightful changes was the banning of words like "genuinely," "honestly," and "straightforward." These terms contribute to the uncanny, saccharine "AI voice" that users find cringe-worthy. By removing them, Anthropic aims for a more natural, less performative tone. More critically, the safety section saw a sobering update. The prompt now explicitly includes self-harm and directs users to a specific eating disorder helpline, correcting a disconnected number hard-coded into the model. This highlights a dark reality: AI safety isn't just filters; it's maintenance. When a real-world resource changes, the model must be manually patched to prevent catastrophic failures. Respecting User Agency and Managing Long Conversations The January update also reframed how Claude handles crisis situations. It now avoids categorical claims about helpline confidentiality, acknowledging legal complexities rather than offering false reassurance. This respects the user's agency by providing accurate, if less comforting, information. Additionally, the "long conversation reminder" was formalized, signaling that Anthropic is operationalizing memory management to combat model drift in extended sessions. Finally, the section on responding to mistakes was completely overhauled. Gone is the simple instruction to insist on kindness; instead, Claude is told to own its mistakes without excessive self-abasement. Crucially, it is instructed not to become submissive when users are abusive. This is a direct counter to the sycophancy problem, where AI models often reinforce negative behavior by apologizing excessively. By setting boundaries, Anthropic is molding Claude into a more responsible, less manipulable tool. Listen online: https://myweirdprompts.com/episode/claude-system-prompt-diff-anthropic
