How a Single Command Revived Claude Fable 5 and Exposed a Major AI Security Flaw

Developer Jamieson O'Reilly injected a leaked system‑prompt into Opus 4.8 with one dangerous command, resurrecting the banned Claude Fable 5 model, revealing stark output differences, and triggering a cascade of revelations about Amazon’s role in Anthropic’s forced shutdown and broader AI safety risks.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
How a Single Command Revived Claude Fable 5 and Exposed a Major AI Security Flaw

System prompt leak and injection

Developer Jamieson O'Reilly downloaded a 120 KB, 1,585‑line system‑prompt file (CLAUDE‑FABLE‑5.md) from the GitHub repository https://github.com/elder-plinius/CL4R1T4S/blob/main/ANTHROPIC/CLAUDE-FABLE-5.md. The file contains the full “personality script” of the discontinued Claude Fable 5 model.

Using Claude Code, he executed:

claude --dangerously-skip-permissions --system-prompt-file CLAUDE-FABLE-5.md

The --dangerously-skip-permissions flag disables the safety confirmation dialog, allowing the prompt to be injected without user consent.

Side‑by‑side experiment

Two Opus 4.8 instances were launched in parallel:

Left pane: Opus 4.8 with the Fable 5 system prompt injected.

Right pane: vanilla Opus 4.8 with no injected prompt.

Both were given the identical instruction “create a modern Apple‑style landing page.” The injected instance produced a page with distinct branding, tone, layout and module structure, which O'Reilly described as “a completely different species.” The vanilla instance generated a generic template. Screenshots in the original article show the visual gap.

Technical content of the leaked prompt

The prompt file is 12 × 10⁴ characters, 1,585 lines, organized into 72 named sections and includes JSON definitions for 18 tools. It represents the core “personality” of Fable 5.

Amazon‑driven security test and shutdown

Internal testing by Amazon, a major Anthropic investor, used a specially crafted prompt to bypass Fable 5’s safety guardrails and extract information that could be weaponized for network attacks. The result was escalated to U.S. officials, leading to a rapid 90‑minute window in which Anthropic was forced to shut down the model.

Anthropic’s public statement called the action a “misunderstanding,” but the incident highlighted a conflict between commercial pressure and the company’s “security‑first” stance.

Broader implications

The experiment demonstrates that a single system‑prompt injection can resurrect a disabled model, effectively bypassing external controls. As model capabilities approach superintelligence, the margin for human intervention narrows, raising concerns about the durability of safety guardrails.

References

WSJ article: https://www.wsj.com/tech/ai/amazon-ceos-talks-with-u-s-officials-triggered-crackdown-on-anthropic-models-dcc90578?st=Yct6gx&reflink=desktopwebshare_permalink

Twitter thread: https://x.com/theonejvo/status/2065816283476824126?s=20

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

prompt injectionAI securityClaudeAmazonAnthropicOpenRouterFable 5
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.