Artificial Intelligence 11 min read

Claude Fable 5 Deep Dive: First Public Mythos‑Level Model That Crushes All Benchmarks

Anthropic’s Claude Fable 5, released on June 9, is the first publicly available Mythos‑level model that outperforms competitors across code, reasoning, and visual benchmarks, demonstrates autonomous long‑run operation, powers real‑world cases like Stripe’s massive code migration, and introduces a controversial safety‑degradation system.

Lao Guo's Learning Space

Jun 12, 2026

Claude Fable 5 Deep Dive: First Public Mythos‑Level Model That Crushes All Benchmarks

Fable 5 vs Mythos 5

Both released on June 9, share the same underlying model; the difference is the width of safety guardrails. Fable 5 is publicly available and retains full network‑security, biochemical, and distillation classifiers. Mythos 5 is limited to Glasswing partners and removes some limits. Pricing is $10 per million input tokens and $50 per million output tokens. SWE‑Bench Pro score is 80.3 % for Fable 5 versus 77.8 % for Mythos 5.

Core benchmark results

Code ability

SWE‑Bench Pro: Fable 5 80.3 % vs Opus 4.8 69.2 % vs GPT‑5.5 58.6 % vs Gemini 3.1 Pro 54.2 % (lead of 21.7 percentage points over GPT‑5.5).

SWE‑Bench Verified: Fable 5 95.0 % vs Opus 4.8 88.6 % (≈1 failure per 20 tasks).

FrontierCode Diamond: Fable 5 29.3 % vs Opus 4.8 13.4 % (2.2×) vs GPT‑5.5 5.7 % (5.1×).

General reasoning

Humanity’s Last Exam (no tools): Fable 5 59 % vs GPT‑5.5 41.4 %.

Humanity’s Last Exam (with tools): Fable 5 64.5 % (GPT‑5.5 not disclosed).

Agent capability

Multi‑step tracking score 80.7 (strongest reported).

With persistent file‑level memory, performance on Slay the Spire is 3× that of Opus 4.8.

Three major breakthroughs

Breakthrough 1 – From “human‑directed AI” to “AI self‑decision”

“Previously we checked whether Claude did the work right – no laziness, no errors. After using Fable 5 we check whether Claude made the correct decision.” – Thariq, Claude Code team

Engineers shift from decomposing and verifying each step to stating a goal and letting the model complete it.

Researchers shift from guiding experiments to posing problems and reviewing AI‑generated insights.

Visual developers shift from translating images for the model to the model deciding whether visual information is relevant.

Breakthrough 2 – Qualitative shift in visual understanding

Completed Pokémon Red using only raw pixel frames (no map, guide, or cheat).

Extracted precise numeric values from complex scientific charts.

Reconstructed web‑app source code from a screenshot.

Set new state‑of‑the‑art performance on visual tasks.

Breakthrough 3 – Extended autonomous run time

Processes millions of tokens in long‑running tasks without losing focus.

Improves its own output by leveraging internal notes; advantage grows with task length and complexity.

Higher token efficiency than previous generations.

Real‑world case studies

Case 1 – Stripe code migration

Task: migrate a 50 M‑line Ruby codebase.

Result: completed in one day; an equivalent human team would need ~2 months.

Additional test: migrated 750 k lines of Rust with 99.8 % success rate.

Case 2 – De‑novo protein design (Mythos 5)

Designed a 138‑amino‑acid protein from scratch within one week.

Internal workflow accelerated ~10×.

9 of 14 targets produced strong candidate solutions.

Blind test by scientists showed 80 % preference for Mythos 5 results.

Case 3 – Financial reasoning

Achieved top score on the Hebbia Finance Benchmark, outperforming all evaluated models.

Led in document analysis, chart interpretation, and complex reasoning.

Technical specifications and pricing

Model ID: claude-fable-5 Context window: 1 M tokens

Maximum output: 128 K tokens

Input price: $10 per million tokens

Output price: $50 per million tokens

Cache read price: $1 per million tokens (10 % of base price)

Batch API discount: 50 % off

Access platforms: Claude API, AWS Bedrock, Vertex AI, Microsoft Foundry

Price comparison with competitors

Claude Fable 5 – $10 input / $50 output

Claude Opus 4.8 – $5 input / $25 output

GPT‑5.5 – ~ $10 input / ~ $40 output

Safety mechanism: classifier + degradation

How it works

When a request falls into any of three categories – network security, biochemical, or distillation (attempts to train other models) – the system automatically degrades to Opus 4.8 instead of rejecting outright.

~95 % of conversations do not trigger degradation.

Explicit restriction: cannot be used to develop new large models.

Zero‑data‑retention not supported; a mandatory 30‑day data retention is enforced.

Controversy

The AI‑research community objects to “silent degradation” without user awareness. Anthropic pledged greater transparency. The restriction on model‑building is interpreted as a moat, contrasting with earlier calls for a global AI‑development pause.

Selection decision principle

Fable 5 is positioned as an upgrade option when Opus 4.8 capabilities are insufficient, not as a daily general‑purpose model.

Karpathy’s comment

“Very exciting version. From hands‑on experience this feels like a ‘major version upgrade’ comparable to the leap from Claude 4.5 last November.” – Andrej Karpathy

Conclusion

When AI can autonomously decide “what to do”, the human role shifts from overseer to judge and decision‑maker, evaluating goals, directions, and what is worth pursuing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Code migration Model safety AI benchmarks SWE‑Bench Protein design AI collaboration Claude Fable 5

Written by

Lao Guo's Learning Space

AI learning, discussion, and hands‑on practice with self‑reflection

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Fable 5 vs Mythos 5

Core benchmark results

Code ability

General reasoning

Agent capability

Three major breakthroughs

Breakthrough 1 – From “human‑directed AI” to “AI self‑decision”

Breakthrough 2 – Qualitative shift in visual understanding

Breakthrough 3 – Extended autonomous run time

Real‑world case studies

Case 1 – Stripe code migration

Case 2 – De‑novo protein design (Mythos 5)

Case 3 – Financial reasoning

Technical specifications and pricing

Price comparison with competitors

Safety mechanism: classifier + degradation

How it works

Controversy

Selection decision principle

Karpathy’s comment

Conclusion

Lao Guo's Learning Space

How this landed with the community

Was this worth your time?

0 Comments

Fable 5 vs Mythos 5

Breakthrough 1 – From “human‑directed AI” to “AI self‑decision”

Breakthrough 2 – Qualitative shift in visual understanding

Breakthrough 3 – Extended autonomous run time

Case 1 – Stripe code migration

Case 2 – De‑novo protein design (Mythos 5)

Case 3 – Financial reasoning

Safety mechanism: classifier + degradation