Claude Sonnet 5 Launch: Near‑Opus 4.8 Performance at Only 60% of the Cost
Anthropic's newly released Claude Sonnet 5 delivers markedly improved agentic capabilities, achieving benchmark scores close to Opus 4.8 while costing roughly 60% of the price, and is now the default model across Claude's platforms with a 1 M‑token context window.
Anthropic has released Claude Sonnet 5, the strongest agentic model in the Sonnet series, positioned as the new default for daily high‑frequency workflows such as coding, tool use, browsing, planning, and knowledge work.
Benchmarking shows clear gains over Sonnet 4.6: on SWE‑bench Pro the model scores 63.2% (vs 58.1% for 4.6, 69.2% for Opus 4.8); on Humanity’s Last Exam without tools it reaches 43.2% (vs 34.6% and 49.8%); with tools it climbs to 57.4%, nearly matching Opus 4.8. OSWorld‑Verified results are 81.2% for Sonnet 5, compared with 78.5% for 4.6 and 83.4% for Opus 4.8.
The single‑task cost curve of Sonnet 5 now aligns closely with Opus 4.8, yet its API pricing is lower, making it a viable substitute in many scenarios.
Cursor has integrated Sonnet 5, reporting a CursorBench 3.1 score of 57% versus 49% for Sonnet 4.6, and the high‑default configuration now approaches Opus 4.8 performance with lower average task cost.
A community member compared UltraCode generation of a simple HTML login page: Sonnet 5 used 20.9k input and 14.2k output tokens, costing $3.36 and completing in 2 min 11 s; Opus 4.8 consumed 96.3k input and 73.8k output tokens, costing $20.66 and taking 20 min 15 s, though the latter produced higher quality output.
tokens used: 20.9k in, 14.2k out
Total cost: $3.36
Time: 2 min 11 s
tokens used: 96.3k in, 73.8k out
Total cost: $20.66
Time: 20 min 15 s
When measured by Cost per Intelligence Index Task, Sonnet 5’s max configuration costs $2.29 per task, higher than Opus 4.8 ($1.80) and considerably above GPT‑5.5 xhigh ($1.03) and GLM‑5.2 max ($0.48), highlighting that API price alone does not determine total cost.
Sonnet 5 employs an updated tokenizer that splits text into more tokens, and Anthropic states the promotional pricing is intended to keep migration costs from 4.6 to 5 roughly equal.
Safety evaluations show Sonnet 5 surpasses Sonnet 4.6, with better refusal of malicious requests, stronger resistance to prompt‑injection hijacks, and lower hallucination and sycophancy rates. In automated behavior‑audit tests, its overall safety score is higher, though it still lags slightly behind Opus 4.8 and Claude Mythos Preview on certain misbehavior metrics.
Sonnet 5 is now the default model for Claude Free and Pro users, with Max, Team, and Enterprise tiers also able to select it. Anthropic has raised rate limits for Chat, Cowork, Claude Code, and Claude Platform to accommodate higher effort levels. The model is available via Claude API, Claude Platform on AWS, Amazon Bedrock, Google Cloud, Microsoft Foundry preview, and supports a 1 M‑token context window, crucial for long‑running agentic tasks.
Overall, Sonnet 5 offers a cost‑effective, mid‑tier agentic solution for teams that need stable multi‑step execution, while Opus 4.8 remains the preferred choice for the most accuracy‑critical workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
