Sonnet 4.6 Nears Opus Performance While Retaining Sonnet Pricing
Anthropic released Sonnet 4.6 just 12 days after Opus 4.6, delivering near‑Opus capabilities across coding, computer use, long‑context reasoning, and agent planning with a 1 M‑token window, while keeping the lower Sonnet price, prompting mixed community debate and rapid ecosystem adoption.
What’s New in Sonnet 4.6
Conclusion: Sonnet 4.6 is the strongest Sonnet model to date, with comprehensive upgrades to coding, computer‑use, long‑context reasoning, agent planning, knowledge‑work, and design capabilities. The context window expands to 1 M tokens (beta).
Pricing remains $3/$15 per M tokens, identical to Sonnet 4.5, and free users are upgraded by default.
Anthropic’s official blog reports that in early Claude Code testing, users preferred Sonnet 4.6 over Sonnet 4.5 for 70 % of the time, and even over the flagship Opus 4.5 for 59 % of the time.
Specific user‑reported improvements include more careful context reading before code edits, merging shared logic instead of copy‑pasting, a noticeable reduction in over‑engineering and “lazy” behavior, stronger instruction‑following, and fewer false‑success statements and hallucinations.
Benchmark Results
Official benchmark data show Sonnet 4.6 approaching or exceeding Opus‑level performance on multiple dimensions.
On the GDPval‑AA benchmark (which measures complex real‑world agent tasks using a public problem set evaluated by Gemini), Sonnet 4.6 performs strongly. Ethan Mollick notes that GDPval‑AA differs from the full GDPval suite, as the former uses only public questions and automatic Gemini scoring.
Vals AI’s independent evaluation places Sonnet 4.6 first on both the Vals Index and Vals Multimodal Index, surpassing Opus 4.6.
Cursor’s pragmatic benchmark shows a clear improvement over Sonnet 4.5 on long‑running tasks, though intelligence remains below Opus 4.6, confirming that Sonnet 4.6 is closing the gap but not yet a full replacement.
Computer‑Use Capability: 16 Months of Progress
The improvement in computer‑use ability is likely the most under‑appreciated aspect of Sonnet 4.6.
When Anthropic first introduced a general computer‑use model in October 2024, it described the feature as experimental and sometimes clumsy. The OSWorld benchmark (testing AI control of Chrome, LibreOffice, VS Code, etc.) shows continuous improvement across the Sonnet series over the past 16 months.
Early user feedback indicates Sonnet 4.6 can handle complex spreadsheets, multi‑step web forms, and coordination across multiple browser tabs at near‑human levels, though it still lags behind expert humans.
Security also improves: resistance to prompt‑injection attacks is markedly higher than in Sonnet 4.5, approaching Opus 4.6 levels, which is crucial for web‑based computer‑use scenarios.
Vending‑Bench: Learning to "Invest First, Profit Later"
Vending‑Bench Arena evaluates AI models that run a simulated business and compete for profit.
Sonnet 4.6 adopts a new strategy: during the first ten simulated months it heavily invests in capacity expansion, spending far more than competitors, then pivots in the final phase to focus on profitability. This "invest‑first, profit‑later" timing gives it a decisive lead.
The result demonstrates that the 1 M‑token context window enables the model to plan and decide over much longer horizons.
Tool Ecosystem Synchronizes
On the day Sonnet 4.6 launched, the AI‑coding tool ecosystem reacted immediately.
Claude Code 2.1.45 updated the default model to Sonnet 4.6 and extended the knowledge cutoff from January 2025 to August 2025. The release also fixed several practical issues: sandbox file‑write permission errors on macOS, crashes when background agents finish, and unbounded memory growth from large command outputs.
Cursor announced support for Sonnet 4.6, and Perplexity switched its Pro‑user Comet browser Agent default to Sonnet 4.6, while Max users can choose between Sonnet 4.6 and Opus 4.6.
At the API level, Claude’s web‑search and fetch tools now automatically generate code to filter and process results, retaining only relevant content in the context. Code execution, memory, and programmatic tool calls are now generally available.
Is Opus Still Needed?
The community is divided.
One camp argues that Sonnet 4.6 is sufficient: its price is only a fraction of Opus, and its performance is close enough for most agent‑workflow tasks, yielding substantial cost savings for 24/7 agents.
The other camp points to Cursor’s assessment that, while Sonnet 4.6 improves long‑task performance, its intelligence remains below Opus 4.6. For deep‑reasoning scenarios such as large‑scale code refactoring, multi‑agent coordination, or mission‑critical tasks, Opus remains the stronger choice.
Anthropic itself confirms that Opus 4.6 remains the top option for tasks requiring the deepest reasoning.
Strategically, Anthropic keeps Opus at the high‑end tier while positioning Sonnet to capture the mid‑market, effectively “Opus holds the throne, Sonnet eats the market.”
Why Two Models in 12 Days?
Opus 4.6 launched on February 5, followed by Sonnet 4.6 on February 17, an unusually rapid cadence.
One possible explanation is competitive pressure: Google’s Gemini series and OpenAI’s ongoing releases force Anthropic to maintain competitive products across price tiers.
Another angle is that Sonnet 4.6 likely shares training成果 with Opus 4.6; although not officially disclosed, the short interval and performance overlap suggest overlapping development cycles.
Regardless of the cause, faster model iteration benefits users through quicker cost‑performance improvements.
Sonnet 4.6 upgrades coding, computer‑use, long‑context reasoning, etc.; 1 M‑token window in beta
Price unchanged ($3/$15 per M tokens); free users auto‑upgraded
Claude Code users prefer Sonnet 4.6 over 4.5 for 70 % of the time, and over Opus 4.5 for 59 %
Cursor, Claude Code, and Perplexity added same‑day support
Opus 4.6 remains stronger on deep‑reasoning tasks, but Sonnet 4.6 covers most everyday scenarios
Anthropic’s strategy: Opus guards the high‑end, Sonnet captures the market
Reference Links
Anthropic official blog – Introducing Sonnet 4.6: https://www.anthropic.com/news/claude-sonnet-4-6
Claude Sonnet 4.6 System Card: https://anthropic.com/claude-sonnet-4-6-system-card
Claude API model documentation: https://platform.claude.com/docs/en/about-claude/models/overview
OSWorld Benchmark: https://os-world.github.io/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ShiZhen AI
Tech blogger with over 10 years of experience at leading tech firms, AI efficiency and delivery expert focusing on AI productivity. Covers tech gadgets, AI-driven efficiency, and leisure— AI leisure community. 🛰 szzdzhp001
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
