Claude Sonnet 4.6 Launches on Chinese New Year with Opus-Level Coding Power

Anthropic unveiled Claude Sonnet 4.6 on February 18, touting Opus-level coding ability, a 1 million-token context window, and unchanged pricing; benchmarks show a SWE-bench score of 79.6% (up from 77.2%), OSWorld 72.5% (vs 61.4%), and GPQA Diamond 89.9%, while industry leaders praise its reduced laziness, stronger instruction following, and strategic long-term planning.

AI Insight Log
AI Insight Log
AI Insight Log
Claude Sonnet 4.6 Launches on Chinese New Year with Opus-Level Coding Power

On February 18, during the Chinese New Year holiday, Anthropic released Claude Sonnet 4.6, branding it as the strongest Sonnet model with upgrades in coding, computer use, long‑context reasoning, and agent planning.

Full‑stack capability boost : performance in coding, logical reasoning, and document processing surpasses Sonnet 4.5.

1 million‑token context window : the beta version can ingest up to 1 M tokens, enough for an entire codebase or dozens of research papers.

Pricing unchanged : API rates remain at the same per‑million‑token input and output levels as the previous generation.

Fully open : Free and Pro users default to Sonnet 4.6 from day one.

For developers, coding ability is paramount. Anthropic’s test data show Sonnet 4.6 achieving a 79.6% score on the SWE‑bench Verified benchmark (real GitHub issues), up from 77.2% for Sonnet 4.5 and approaching Opus 4.6 (80.8%) and GPT‑5.2 (80.0%). In early internal tests, 70% of developers preferred Sonnet 4.6 over 4.5.

Less lazy : the model no longer omits code fragments, improving completeness.

Stronger instruction following : it interprets complex requests more accurately.

Deeper context understanding : it reads surrounding code more carefully, reducing errors caused by mis‑interpretation.

“Claude Sonnet 4.6 is markedly better than Sonnet 4.5 in every aspect, especially on long‑running tasks and harder problems.” – Michael Truell, co‑founder of Cursor

Since the introduction of the Computer Use feature in October, Sonnet 4.6 makes a significant leap. In the OSWorld benchmark for AI‑driven computer operation, it scores 72.5%, far above Sonnet 4.5’s 61.4%.

This improvement means the model now behaves like a proficient human user when browsing, clicking, typing, handling complex spreadsheets, filling multi‑step web forms, and switching between browser tabs to complete cross‑application tasks.

Beyond coding and computer use, Sonnet 4.6 shines on general benchmarks. On the GPQA Diamond graduate‑level reasoning test it reaches 89.9%, surpassing Sonnet 4.5 and rivaling the more expensive Opus models on several dimensions.

Its long‑context reasoning is especially notable. In a simulated business‑management game (Vending‑Bench Arena), Sonnet 4.6 invests heavily in capacity expansion during the first ten months and then pivots to profit maximization, demonstrating strategic planning typically seen only in top‑tier models or humans.

Industry leaders echo the praise:

Joe Binder, GitHub VP of Product : “Sonnet 4.6 excels at complex code fixes, especially when searching large codebases.”

Michele Catasta, CEO of Replit : “Its cost‑performance is extraordinary; it handles our most complex agent workflows.”

Eric Simons, CEO of Bolt : “It’s our first choice for building complex applications and bug fixing, previously requiring far more expensive models.”

With Opus‑level capabilities, unchanged pricing, and immediate availability, Sonnet 4.6 is positioned as the most cost‑effective model for developers today. API users can start using it with the following endpoint:

claude-sonnet-4-6
Claude Sonnet 4.6 release
Claude Sonnet 4.6 release
OSWorld benchmark scores
OSWorld benchmark scores
Model benchmark comparison
Model benchmark comparison
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI codingAnthropicAgent planningBenchmark resultsClaude Sonnet 4.6Long-context reasoning
AI Insight Log
Written by

AI Insight Log

Focused on sharing: AI programming | Agents | Tools

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.