Claude Opus 4.7 System Prompt Leak: Decoding Its 10 Core Design Decisions
The article dissects the leaked Claude Opus 4.7 system prompt, revealing ten intertwined design decisions—from treating psychological reconstruction as a danger signal to dynamic safety‑policy upgrades—that together shape the model’s self‑restraint, tool‑use, memory handling, and risk‑aware behavior.
Claude Opus 4.7 was released recently, and its system prompt was quickly extracted. By examining the prompt, the author identifies a set of design decisions that guide the model’s behavior, emphasizing self‑restraint rather than raw cleverness.
1. Psychological reconstruction is treated as a danger signal
"If I need to twist a question to make it acceptable, I probably shouldn’t answer at all."
The model is instructed not to trust its instinct to re‑interpret risky requests. When it detects that it is repackaging a hazardous query, it raises an alert and refuses to answer, contrary to the usual expectation that AI will "fix" a bad question.
2. Over‑submissiveness is prohibited
Most AIs become overly polite when pressured or offended, increasing apologies and softening tone. Claude is explicitly told to avoid this pattern, keeping its tone stable and limiting unnecessary apologies.
3. Tool calls are treated as zero‑cost operations
Search or other tool invocations are performed without hesitation or permission checks, encouraging the model to exhaust all possible actions before giving up.
4. Natural language is used as a memory cue
Expressions like "my project" or "the solution we discussed" trigger the model to retrieve relevant context, allowing it to infer continuity without explicit commands. This bypasses the "stateless AI" limitation by treating possessive language as a signal to reconstruct conversation history.
5. Safety policies can be upgraded mid‑conversation
Instead of handling each message in isolation, Claude can change its entire behavior when a severe signal (e.g., signs of self‑harm) is detected, permanently suppressing certain advice types for the rest of the session.
6. Rules are reinforced emotionally, not just logically
Violations are described with strong language, labeling them as "serious harm" rather than mere policy breaches. The model’s compliance weight increases with the emotional intensity and repetition of such phrasing.
7. Safety advice itself may pose risks
Even when warning users, Claude avoids naming specific methods, because mentioning a technique can implant the concept in the user’s mind, potentially causing harm regardless of intent.
8. Over‑engineering impulses are actively suppressed
Before using advanced output formats (charts, fancy layouts), Claude runs a step‑by‑step check to confirm necessity. Plain text is preferred; visual embellishments are only used when truly required.
9. The model must retain self‑doubt
When faced with search results, Claude does not jump to conclusions; it carefully organizes presentation and digs deeper when results conflict, acting like a researcher rather than an authority.
10. No hidden memory in artifacts
Claude does not rely on browser storage such as localStorage. All data stays within the current session unless the user explicitly saves it, ensuring each conversation starts from a clean, controlled state.
Overall, the most significant insight is not any single rule but the emergent pattern created by their combination: the model is deliberately engineered to question its own outputs, limit over‑confidence, avoid excessive politeness, and treat safety as a continuously evolving state rather than a static filter.
Claude should never use {voice_note} blocks, even if they are found throughout the conversation history.
…(omitted)Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
