Claude Fable 5 Deep Dive: First Public Mythos‑Level Model That Crushes All Benchmarks
Anthropic’s Claude Fable 5, released on June 9, is the first publicly available Mythos‑level model that outperforms competitors across code, reasoning, and visual benchmarks, demonstrates autonomous long‑run operation, powers real‑world cases like Stripe’s massive code migration, and introduces a controversial safety‑degradation system.
Fable 5 vs Mythos 5
Both released on June 9, share the same underlying model; the difference is the width of safety guardrails. Fable 5 is publicly available and retains full network‑security, biochemical, and distillation classifiers. Mythos 5 is limited to Glasswing partners and removes some limits. Pricing is $10 per million input tokens and $50 per million output tokens. SWE‑Bench Pro score is 80.3 % for Fable 5 versus 77.8 % for Mythos 5.
Core benchmark results
Code ability
SWE‑Bench Pro: Fable 5 80.3 % vs Opus 4.8 69.2 % vs GPT‑5.5 58.6 % vs Gemini 3.1 Pro 54.2 % (lead of 21.7 percentage points over GPT‑5.5).
SWE‑Bench Verified: Fable 5 95.0 % vs Opus 4.8 88.6 % (≈1 failure per 20 tasks).
FrontierCode Diamond: Fable 5 29.3 % vs Opus 4.8 13.4 % (2.2×) vs GPT‑5.5 5.7 % (5.1×).
General reasoning
Humanity’s Last Exam (no tools): Fable 5 59 % vs GPT‑5.5 41.4 %.
Humanity’s Last Exam (with tools): Fable 5 64.5 % (GPT‑5.5 not disclosed).
Agent capability
Multi‑step tracking score 80.7 (strongest reported).
With persistent file‑level memory, performance on Slay the Spire is 3× that of Opus 4.8.
Three major breakthroughs
Breakthrough 1 – From “human‑directed AI” to “AI self‑decision”
“Previously we checked whether Claude did the work right – no laziness, no errors. After using Fable 5 we check whether Claude made the correct decision.” – Thariq, Claude Code team
Engineers shift from decomposing and verifying each step to stating a goal and letting the model complete it.
Researchers shift from guiding experiments to posing problems and reviewing AI‑generated insights.
Visual developers shift from translating images for the model to the model deciding whether visual information is relevant.
Breakthrough 2 – Qualitative shift in visual understanding
Completed Pokémon Red using only raw pixel frames (no map, guide, or cheat).
Extracted precise numeric values from complex scientific charts.
Reconstructed web‑app source code from a screenshot.
Set new state‑of‑the‑art performance on visual tasks.
Breakthrough 3 – Extended autonomous run time
Processes millions of tokens in long‑running tasks without losing focus.
Improves its own output by leveraging internal notes; advantage grows with task length and complexity.
Higher token efficiency than previous generations.
Real‑world case studies
Case 1 – Stripe code migration
Task: migrate a 50 M‑line Ruby codebase.
Result: completed in one day; an equivalent human team would need ~2 months.
Additional test: migrated 750 k lines of Rust with 99.8 % success rate.
Case 2 – De‑novo protein design (Mythos 5)
Designed a 138‑amino‑acid protein from scratch within one week.
Internal workflow accelerated ~10×.
9 of 14 targets produced strong candidate solutions.
Blind test by scientists showed 80 % preference for Mythos 5 results.
Case 3 – Financial reasoning
Achieved top score on the Hebbia Finance Benchmark, outperforming all evaluated models.
Led in document analysis, chart interpretation, and complex reasoning.
Technical specifications and pricing
Model ID: claude-fable-5 Context window: 1 M tokens
Maximum output: 128 K tokens
Input price: $10 per million tokens
Output price: $50 per million tokens
Cache read price: $1 per million tokens (10 % of base price)
Batch API discount: 50 % off
Access platforms: Claude API, AWS Bedrock, Vertex AI, Microsoft Foundry
Price comparison with competitors
Claude Fable 5 – $10 input / $50 output
Claude Opus 4.8 – $5 input / $25 output
GPT‑5.5 – ~ $10 input / ~ $40 output
Safety mechanism: classifier + degradation
How it works
When a request falls into any of three categories – network security, biochemical, or distillation (attempts to train other models) – the system automatically degrades to Opus 4.8 instead of rejecting outright.
~95 % of conversations do not trigger degradation.
Explicit restriction: cannot be used to develop new large models.
Zero‑data‑retention not supported; a mandatory 30‑day data retention is enforced.
Controversy
The AI‑research community objects to “silent degradation” without user awareness. Anthropic pledged greater transparency. The restriction on model‑building is interpreted as a moat, contrasting with earlier calls for a global AI‑development pause.
Selection decision principle
Fable 5 is positioned as an upgrade option when Opus 4.8 capabilities are insufficient, not as a daily general‑purpose model.
Karpathy’s comment
“Very exciting version. From hands‑on experience this feels like a ‘major version upgrade’ comparable to the leap from Claude 4.5 last November.” – Andrej Karpathy
Conclusion
When AI can autonomously decide “what to do”, the human role shifts from overseer to judge and decision‑maker, evaluating goals, directions, and what is worth pursuing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Lao Guo's Learning Space
AI learning, discussion, and hands‑on practice with self‑reflection
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
