Anthropic’s Mythos Model Unveiled: Why Only the Braked‑Down Fable 5 Is Public

Anthropic released Claude Fable 5 to the public while keeping the more capable Claude Mythos 5 locked behind safety guardrails, and benchmark results show Fable 5 outperforms competing models in programming, vision, and complex tasks, though its scores are deliberately lowered in sensitive domains.

AI Insight Log
AI Insight Log
AI Insight Log
Anthropic’s Mythos Model Unveiled: Why Only the Braked‑Down Fable 5 Is Public

Anthropic announced that the model family previously known as Mythos is now split into two public variants: Claude Fable 5, which is released with safety brakes, and Claude Mythos 5, which remains locked and is only available to a small, vetted group.

Claude 官方发布 Fable 5
Claude 官方发布 Fable 5

The official explanation links the names: “Fable” (from Latin *fabula*) means “the thing that is told,” sharing a root with Greek *mythos*. Both variants share the same underlying weight set; the only difference is the presence or absence of safety guardrails.

Benchmark performance

Anthropic’s comparison table places Mythos 5/Fable 5 alongside its own Opus 4.8 and rivals GPT 5.5 and Gemini 3.1 Pro. Notable numbers include:

SWE‑Bench Pro (Agentic programming) : 80.3% (Mythos 5/Fable 5) vs 69.2% (Opus 4.8) vs 58.6% (GPT 5.5) vs 54.2% (Gemini 3.1 Pro).

FrontierCode (diamond difficulty) : 29.3% vs 13.4% (Opus 4.8) vs 5.7% (GPT 5.5).

Terminal‑Bench 2.1 (terminal‑environment programming) : 88.0% vs 82.7% (Opus 4.8) vs 83.4% (GPT 5.5 + Codex CLI).

Knowledge‑work GDPval‑AA : 1932 points vs 1890 (Opus 4.8) vs 1769 (GPT 5.5) vs 1314 (Gemini 3.1 Pro).

Humanity’s Last Exam (multidisciplinary reasoning) : 59.0% without tools, 64.5% with tools.

Mythos 5 / Fable 5 与各家模型的基准对比
Mythos 5 / Fable 5 与各家模型的基准对比

The author notes that the 80.3% on SWE‑Bench Pro is not a marginal gain but a clear lead over same‑generation competitors.

Cost‑vs‑accuracy curve on FrontierCode

Fable 5’s accuracy keeps rising as more compute is spent, forming an upward‑sloping curve, whereas Opus 4.8 plateaus and GPT 5.5 even declines, indicating Fable 5’s deeper capability on long‑chain, complex tasks.

Agentic 编程:SWE-Bench Pro 与 FrontierCode
Agentic 编程:SWE-Bench Pro 与 FrontierCode
FrontierCode 准确率对成本曲线
FrontierCode 准确率对成本曲线

Real‑world use cases

Anthropic cites two examples: Stripe used the model to migrate a codebase, reducing a two‑month manual effort to one day; in drug design, protein experts reported roughly ten‑fold acceleration of certain workflows. In a blind test, scientists preferred solutions generated by Mythos about 80% of the time.

Vision capabilities

Fable 5 can extract precise numbers from complex scientific charts, rewrite source code from a few webpage screenshots, and even play a full game of Pokémon FireRed by visually interpreting the screen without any internal state hints.

The article includes a delayed‑recording of Fable 5 completing Pokémon FireRed solely from raw screenshots.

Creative demo

Fable 5 generated a fluid‑simulation program whose motion synced to a self‑composed EDM track, despite never having “heard” any audio, blending code, visuals, and rhythm in a way that stretches the traditional notion of programming.

Safety guardrails (the “brakes”)

Fable 5 incorporates three categories of safeguards covering cybersecurity, biochemistry, and model distillation. When a request falls into a high‑risk area—e.g., planning a network attack or dual‑use bio research—the model hands the conversation off to the lower‑capability Opus 4.8 and notifies the user.

Anthropic reports that over 95% of sessions never trigger a fallback; less than 5% of conversations are redirected. In internal security tests, harmful requests for network‑attack planning received zero effective responses.

These guardrails explain the asterisked scores in the benchmark table: on security‑ and bio‑related questions, Fable 5’s results gravitate toward Opus 4.8 because the higher‑risk answers are filtered out.

Mythos 5 access

Mythos 5 removes part of the cybersecurity guardrail and is delivered through a project called “Glasswing,” limited to a small group of U.S. government‑backed network‑defense and critical‑infrastructure partners. Anthropic plans to expand access via a “trusted‑access program” focused on defensive security work and biomedical research.

Developer tooling feedback

Boris Cherny, founder of Claude Code, praised Fable 5’s integration with Claude Code and Cowork, calling it one of the best programming models he has used, noting lower prompt requirements, more efficient token usage, higher code quality, smarter tool calls, stronger self‑verification, longer session endurance, and greater trustworthiness.

Boris Cherny 谈 Fable 5 接入 Claude Code
Boris Cherny 谈 Fable 5 接入 Claude Code

Pricing and availability

Both Fable 5 and Mythos 5 are priced at $10 per million input tokens and $50 per million output tokens, less than half the cost of the earlier Mythos Preview. Fable 5 became publicly available on June 9, with subscription slots being allocated in batches due to high demand.

Author’s perspective

The release’s subtlety lies not in raw scores but in the explicit line drawn between a braked public model and an unbraked, restricted one. Anthropic acknowledges that the true upper bound of the model’s capabilities differs from what they are willing to release broadly, concentrating the power to decide “what to give and what to withhold” in the hands of a few companies.

For developers, having a model that reaches 80% on SWE‑Bench Pro is a tangible benefit, yet the starred scores, the 5% fallback rate, and the locked Mythos remind us that as model abilities advance, control over their deployment becomes increasingly centralized.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI safetyAnthropicAI benchmarksSWE-benchClaude Fable 5Claude Mythos 5
AI Insight Log
Written by

AI Insight Log

Focused on sharing: AI programming | Agents | Tools

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.