Artificial Intelligence 9 min read

Anthropic’s Mythos Preview Crushes Opus 4.6 and Remains Unreleased

Anthropic introduced the Mythos Preview model, which outperforms its flagship Opus 4.6 across coding benchmarks and uncovers thousands of high‑severity security bugs, yet the company keeps the model private and launches a $100 million Project Glasswing initiative with major tech partners to secure critical software.

Node.js Tech Stack

Apr 8, 2026

Anthropic’s Mythos Preview Crushes Opus 4.6 and Remains Unreleased

Anthropic announced the Mythos Preview model, describing it as too powerful to release publicly. The model immediately eclipses the company’s flagship Opus 4.6 on a range of coding benchmarks.

SWE‑bench Verified : 93.9% vs 80.8% (↑13 pts); SWE‑bench Pro : 77.8% vs 53.4% (↑24 pts); Terminal‑Bench 2.0 : 82.0% vs 65.4%; SWE‑bench Multimodal : 59.0% vs 27.1% (more than double). The biggest jump appears on SWE‑bench Pro, which measures solving complex engineering problems in real open‑source projects.

The coding scores are only the tip of the iceberg. Anthropic’s security red‑team used Mythos Preview to scan globally‑used software and uncovered vulnerabilities that had persisted for years. For example, a flaw in FFmpeg’s H.264 decoder existed for 16 years and survived over 5 million automated fuzzing attempts; a signed‑integer overflow in OpenBSD’s kernel had been hidden for 27 years. In total, Mythos identified thousands of high‑severity zero‑day bugs across every major operating system and browser.

Beyond detection, Mythos generated 181 working exploits for Firefox 147’s JavaScript engine, while Opus 4.6 produced only two. The model independently assembled complex JIT heap‑spray attacks, chained four distinct vulnerabilities into a full exploit, and achieved local privilege escalation on Linux and remote code execution on FreeBSD NFS, tasks that would normally require weeks of human effort. Each run cost roughly $1,000–$2,000.

In a large‑scale OSS‑Fuzz run covering about 1,000 repositories and 7,000 entry points, Mythos caused 595 crashes, including ten complete control‑flow hijacks. By contrast, Opus 4.6 and Sonnet 4.6 triggered only 150–175 crashes, each at the lowest difficulty level.

To channel this capability responsibly, Anthropic launched Project Glasswing, partnering with twelve companies—including AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks—and over 40 critical‑infrastructure organizations. The initiative received $100 million of model‑usage credits and additional donations of $2.5 million to the Linux Foundation’s security projects and $1.5 million to the Apache Foundation.

Anthropic explained that the model will remain unavailable until robust safety guardrails are in place, planning to test these safeguards on the upcoming Claude Opus model. The company argues that deploying such powerful AI safely now can pre‑emptively fix vulnerabilities before malicious actors obtain similar capabilities.

The emergence of Mythos Preview signals a turning point: AI models can now discover decades‑old system bugs and produce functional exploits within hours, reshaping the security landscape and forcing developers to treat code security as a critical, AI‑augmented concern.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large language models AI security Anthropic coding benchmarks Project Glasswing Mythos Preview

Written by

Node.js Tech Stack

Focused on sharing AI, programming, and overseas expansion

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.