Opus 4.6 Unlocks Full 1M‑Token Context—GPT‑5.4 Slumps to 36% Accuracy

Anthropic opened its million‑token context window for Claude Opus 4.6, showing a 78.3% MRCR v2 accuracy while competing models like GPT‑5.4 and Gemini 3.1 Pro fall below 40%, and the release also removes pricing premiums, expands media limits six‑fold, and requires no code changes, dramatically improving Claude Code workflows.

AI Insight Log
AI Insight Log
AI Insight Log
Opus 4.6 Unlocks Full 1M‑Token Context—GPT‑5.4 Slumps to 36% Accuracy

1 Million‑Token Context Window Generally Available

Anthropic announced that a 1 M token context window is now generally available for Claude Opus 4.6 and Claude Sonnet 4.6. Opus 4.6 is the default model for Claude Code Max, Team, and Enterprise users; Pro and Sonnet users can enable it with the /extra-usage command.

Long‑Context Retrieval Benchmark (MRCR v2 8‑needle)

Anthropic released MRCR v2 scores, the industry‑standard benchmark for ultra‑long‑context retrieval. At a 1 M token input size the match‑accuracy results are:

Opus 4.6 : 78.3 %

Sonnet 4.6 : 65.1 %

GPT‑5.4 : 36.6 %

Gemini 3.1 Pro : 25.9 %

Sonnet 4.5 : 18.5 %

Between 128 K and 256 K tokens the gap is modest (Opus 4.6 91.9 % vs. GPT‑5.4 79.3 %). Beyond 512 K tokens all competing models drop sharply, while Opus 4.6’s accuracy curve remains flat, forming a “defensive line.” This demonstrates that many models claim long‑context support but cannot retain information at scale; a 1 M token claim does not guarantee usable performance.

What the Full Release Changes

No long‑context pricing premium – Token pricing is identical regardless of length. Opus 4.6 costs $25 per million input + output tokens; Sonnet 4.6 costs $15. There is no extra charge for exceeding a token threshold.

Media processing capacity expands six‑fold – The maximum number of images and PDF pages per request increases from 100 to 600, enabling single‑shot processing of large contracts, technical manuals, or academic papers.

No code changes required – When a request exceeds 200 K tokens the system automatically enables the million‑token context. Existing beta headers remain compatible but are no longer mandatory.

Impact on Claude Code Workflows

Claude Code previously compressed early dialogue when the context approached its limit, causing loss of reasoning chains, debugging context, and code‑change history. With the million‑token context:

Entire codebases can be loaded in a single request, eliminating manual file splitting.

Document sets exceeding 1 000 pages (e.g., legal contracts, technical specifications, research papers) can be processed in one call.

Long‑running agent sessions retain full context, preserving tool‑call chains and reasoning traces.

Competitive Comparison

OpenAI GPT‑5.4 supports a million‑token window but achieves only 36.6 % MRCR v2 accuracy at that length, meaning more than two‑thirds of the information is lost. The reported figure is an average over the 128 K–256 K range.

Google Gemini 3.1 Pro scores 25.9 % accuracy on the same benchmark (Context Arena), indicating substantial room for improvement.

Anthropic Sonnet 4.6 provides a cheaper “value” tier with 65.1 % accuracy at 1 M tokens, roughly 60 % of Opus’s price.

The benchmark focuses on needle‑in‑a‑haystack retrieval; other capabilities such as reasoning or summarization are not covered. Nevertheless, retaining usable retrieval accuracy at the million‑token scale gives Opus 4.6 a clear technical moat over current competitors.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AnthropicAI performancemodel benchmarkingClaude Opusmillion-token context
AI Insight Log
Written by

AI Insight Log

Focused on sharing: AI programming | Agents | Tools

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.