How a Single Command Revived Claude Fable 5 and Exposed a Major AI Security Flaw
Developer Jamieson O'Reilly injected a leaked system‑prompt into Opus 4.8 with one dangerous command, resurrecting the banned Claude Fable 5 model, revealing stark output differences, and triggering a cascade of revelations about Amazon’s role in Anthropic’s forced shutdown and broader AI safety risks.
System prompt leak and injection
Developer Jamieson O'Reilly downloaded a 120 KB, 1,585‑line system‑prompt file (CLAUDE‑FABLE‑5.md) from the GitHub repository https://github.com/elder-plinius/CL4R1T4S/blob/main/ANTHROPIC/CLAUDE-FABLE-5.md. The file contains the full “personality script” of the discontinued Claude Fable 5 model.
Using Claude Code, he executed:
claude --dangerously-skip-permissions --system-prompt-file CLAUDE-FABLE-5.mdThe --dangerously-skip-permissions flag disables the safety confirmation dialog, allowing the prompt to be injected without user consent.
Side‑by‑side experiment
Two Opus 4.8 instances were launched in parallel:
Left pane: Opus 4.8 with the Fable 5 system prompt injected.
Right pane: vanilla Opus 4.8 with no injected prompt.
Both were given the identical instruction “create a modern Apple‑style landing page.” The injected instance produced a page with distinct branding, tone, layout and module structure, which O'Reilly described as “a completely different species.” The vanilla instance generated a generic template. Screenshots in the original article show the visual gap.
Technical content of the leaked prompt
The prompt file is 12 × 10⁴ characters, 1,585 lines, organized into 72 named sections and includes JSON definitions for 18 tools. It represents the core “personality” of Fable 5.
Amazon‑driven security test and shutdown
Internal testing by Amazon, a major Anthropic investor, used a specially crafted prompt to bypass Fable 5’s safety guardrails and extract information that could be weaponized for network attacks. The result was escalated to U.S. officials, leading to a rapid 90‑minute window in which Anthropic was forced to shut down the model.
Anthropic’s public statement called the action a “misunderstanding,” but the incident highlighted a conflict between commercial pressure and the company’s “security‑first” stance.
Broader implications
The experiment demonstrates that a single system‑prompt injection can resurrect a disabled model, effectively bypassing external controls. As model capabilities approach superintelligence, the margin for human intervention narrows, raising concerns about the durability of safety guardrails.
References
WSJ article: https://www.wsj.com/tech/ai/amazon-ceos-talks-with-u-s-officials-triggered-crackdown-on-anthropic-models-dcc90578?st=Yct6gx&reflink=desktopwebshare_permalink
Twitter thread: https://x.com/theonejvo/status/2065816283476824126?s=20
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
