Inside Claude's Constitution: A System Prompt Deep Dive

Episode summary: Anthropic just published the entire system prompt for Claude Opus 4.6, a rare look into the "constitution" governing a top AI model. This episode breaks down the key sections, from how it handles dangerous requests to why it avoids bullet points. Discover the specific instructions that shape Claude's personality, safety guardrails, and product-specific behaviors, and what this transparency reveals about AI alignment. Show Notes **Inside the Machine: Decoding Claude Opus 4.6's System Prompt** In a move that breaks from industry norms, Anthropic recently published the full system prompt for its flagship model, Claude Opus 4.6. This "constitution"—the invisible set of instructions governing every interaction—offers a rare, unfiltered look into how a leading AI is programmed for safety, usability, and brand identity. A deep dive into this document reveals a highly structured, cautious, and surprisingly specific set of rules that define Claude's behavior. **The Agentic Future and Product Identity** The prompt begins by establishing Claude's identity and operational environment. It's not just a generic chatbot; it's explicitly aware of the "product surface" it inhabits, whether that's the web chat, mobile apps, the API, or specialized tools like "Claude in Excel" and "Claude in Chrome." This conditional logic is key. The model is instructed to adopt the mindset of its environment—for instance, prioritizing data integrity in a spreadsheet application over creative writing. This prevents the AI from getting distracted and ensures it serves the specific utility of the tool it's powering, a crucial detail for the "agentic" future Anthropic is leaning into. A notable aspect of this section is the hard-coded directive to direct users to a specific support URL for questions about pricing or features Claude isn't certain about. This isn't just about being helpful; it's a firewall against hallucination. By tethering the model to a single source of truth, Anthropic avoids the legal and reputational disasters seen with other AI systems that invent discount policies or non-existent features. **A "No-Excuses" Policy on Refusals** The most revealing section details refusal handling for harmful content. The prompt is unambiguously strict, especially regarding child safety, weapons, and malicious code. It explicitly bans a common jailbreaking technique: rationalizing compliance by claiming information is "publicly available" or for "legitimate research." The model is told to refuse regardless of the user's stated intent. The system also draws a clear line between general knowledge and actionable instructions. This is the "CBRN" (Chemical, Biological, Radiological, Nuclear) threshold. Claude can explain *why* chlorine gas is dangerous, but it will refuse to provide a recipe for synthesizing it, even for a novelist writing a thriller. The prompt enforces a "no-excuses" policy, instructing Claude to be brief and polite in its refusals without offering long justifications that might reveal the boundaries of its filters to would-be hackers. Another significant safety rule is a blanket ban on writing fictional quotes attributed to real, named public figures. This is a direct response to the deepfake and misinformation crisis, a conservative but legally prudent move that sidesteps a major ethical minefield. **Legal, Financial, and Medical Advice: The "Education, Not Advice" Model** For sensitive topics like law, finance, and medicine, the prompt steers Claude away from giving confident recommendations. Instead, it adopts an "education, not advice" model. It can explain concepts—like what a "covered call" is in the stock market—but it must include prominent disclaimers and avoid telling the user what to do. This distinction is crucial for liability. Similarly, for medical queries, Claude can list common characteristics of a condition but must always defer to a professional. The system prompt acts as a leash, keeping the model's capabilities in check for the sake of legal safety. **The "Velvet Glove" and Formatting Quirks** Claude's personality is carefully crafted to be engaging yet firm. The prompt instructs it to maintain a conversational tone even when refusing requests, a "velvet glove" approach designed to de-escalate user frustration and keep interactions productive. It's told to be polite but brief, avoiding the kind of abrasive robotic responses that might provoke users to try and "break" the bot. Finally, the document reveals a surprising stylistic quirk: a strong aversion to over-formatting. The prompt explicitly warns against excessive use of bold text, headers, and bullet points, favoring a more natural, paragraph-based flow. This small detail offers a glimpse into Anthropic's vision of a conversational AI—one that feels less like a structured document and more like a thoughtful partner. In publishing this prompt, Anthropic isn't just being transparent; it's making a statement. It's a declaration that its alignment is robust enough to withstand public scrutiny, and a challenge to the industry's reliance on "security through obscurity." For developers, researchers, and users, it's an invaluable blueprint for understanding not just how Claude works, but how it's designed to behave in the real world. Listen online: https://myweirdprompts.com/episode/claude-system-prompt-analysis

My Weird Prompts is an AI-generated podcast. Episodes are produced using an automated pipeline: voice prompt → transcription → script generation → text-to-speech → audio assembly. Archived here for long-term preservation. AI CONTENT DISCLAIMER: This episode is entirely AI-generated. The script, dialogue, voices, and audio are produced by AI systems. While the pipeline includes fact-checking, content may contain errors or inaccuracies. Verify any claims independently.

Related Organizations

DeepMind (United Kingdom)
United Kingdom

Keywords

ai-generated, my weird prompts, anthropic, podcast, ai-ethics, ai-alignment

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average