Claude Sonnet 4.6 Launches on Chinese New Year with Opus-Level Coding Power
Anthropic unveiled Claude Sonnet 4.6 on February 18, touting Opus-level coding ability, a 1 million-token context window, and unchanged pricing; benchmarks show a SWE-bench score of 79.6% (up from 77.2%), OSWorld 72.5% (vs 61.4%), and GPQA Diamond 89.9%, while industry leaders praise its reduced laziness, stronger instruction following, and strategic long-term planning.
On February 18, during the Chinese New Year holiday, Anthropic released Claude Sonnet 4.6, branding it as the strongest Sonnet model with upgrades in coding, computer use, long‑context reasoning, and agent planning.
Full‑stack capability boost : performance in coding, logical reasoning, and document processing surpasses Sonnet 4.5.
1 million‑token context window : the beta version can ingest up to 1 M tokens, enough for an entire codebase or dozens of research papers.
Pricing unchanged : API rates remain at the same per‑million‑token input and output levels as the previous generation.
Fully open : Free and Pro users default to Sonnet 4.6 from day one.
For developers, coding ability is paramount. Anthropic’s test data show Sonnet 4.6 achieving a 79.6% score on the SWE‑bench Verified benchmark (real GitHub issues), up from 77.2% for Sonnet 4.5 and approaching Opus 4.6 (80.8%) and GPT‑5.2 (80.0%). In early internal tests, 70% of developers preferred Sonnet 4.6 over 4.5.
Less lazy : the model no longer omits code fragments, improving completeness.
Stronger instruction following : it interprets complex requests more accurately.
Deeper context understanding : it reads surrounding code more carefully, reducing errors caused by mis‑interpretation.
“Claude Sonnet 4.6 is markedly better than Sonnet 4.5 in every aspect, especially on long‑running tasks and harder problems.” – Michael Truell, co‑founder of Cursor
Since the introduction of the Computer Use feature in October, Sonnet 4.6 makes a significant leap. In the OSWorld benchmark for AI‑driven computer operation, it scores 72.5%, far above Sonnet 4.5’s 61.4%.
This improvement means the model now behaves like a proficient human user when browsing, clicking, typing, handling complex spreadsheets, filling multi‑step web forms, and switching between browser tabs to complete cross‑application tasks.
Beyond coding and computer use, Sonnet 4.6 shines on general benchmarks. On the GPQA Diamond graduate‑level reasoning test it reaches 89.9%, surpassing Sonnet 4.5 and rivaling the more expensive Opus models on several dimensions.
Its long‑context reasoning is especially notable. In a simulated business‑management game (Vending‑Bench Arena), Sonnet 4.6 invests heavily in capacity expansion during the first ten months and then pivots to profit maximization, demonstrating strategic planning typically seen only in top‑tier models or humans.
Industry leaders echo the praise:
Joe Binder, GitHub VP of Product : “Sonnet 4.6 excels at complex code fixes, especially when searching large codebases.”
Michele Catasta, CEO of Replit : “Its cost‑performance is extraordinary; it handles our most complex agent workflows.”
Eric Simons, CEO of Bolt : “It’s our first choice for building complex applications and bug fixing, previously requiring far more expensive models.”
With Opus‑level capabilities, unchanged pricing, and immediate availability, Sonnet 4.6 is positioned as the most cost‑effective model for developers today. API users can start using it with the following endpoint:
claude-sonnet-4-6Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
