Artificial Intelligence 10 min read

Anthropic Releases Claude Sonnet 5: Near‑Opus 4.8 Performance and Stronger Agent Skills

Anthropic’s Claude Sonnet 5 arrives with markedly higher reasoning, tool‑use and programming abilities than Sonnet 4.6, closing the gap to Opus 4.8 while offering a lower price tier, improved safety scores, a new tokenizer that raises token counts, higher rate limits, and mixed developer cost feedback.

Machine Heart

Jun 30, 2026

Anthropic Releases Claude Sonnet 5: Near‑Opus 4.8 Performance and Stronger Agent Skills

Anthropic officially launched Claude Sonnet 5, branding it as the most agentic Sonnet model to date, capable of planning and autonomously using browsers, terminals and other tools at a level that previously required larger, more expensive models.

Compared with Sonnet 4.6, Sonnet 5 shows significant gains in inference, tool use, programming and knowledge‑work tasks, bringing its performance close to Opus 4.8. Cost‑performance curves (orange line) demonstrate a broader range of options than Opus 4.8 (yellow line) and a clear improvement over Sonnet 4.6 (gray line). At medium effort levels Sonnet 5 markedly improves cost efficiency; at higher effort it matches Opus 4.8 on several tasks.

Sonnet 5 outperforms Sonnet 4.6 across all measured dimensions.

Its cost‑performance envelope is wider than Opus 4.8.

Users can adjust effort level to balance cost and performance between Sonnet 5 and Opus 4.8.

Pricing starts at $2 / million input tokens and $10 / million output tokens until 31 Aug 2026, after which standard rates of $3 / M input and $15 / M output apply. By contrast, Opus 4.8 costs $5 / M input and $25 / M output. The system card shows Sonnet 5’s tokenizer maps the same text to 1.0–1.35× more tokens, which drives a higher per‑task cost ($2.29) – roughly double Sonnet 4.6 and about 15 % higher than Opus 4.8.

Pre‑deployment safety assessments indicate overall improvements over Sonnet 4.6: better refusal of malicious requests, lower hallucination and flattery rates, and reduced misbehavior scores in automated behavior audits. However, Sonnet 5’s misbehavior rate remains slightly higher than Opus 4.8 and Claude Mythos Preview. In a Firefox‑exploit test, neither Sonnet model produced a fully functional exploit (0 % success), but Sonnet 5 showed a marginally higher partial‑success rate than Sonnet 4.6.

Anthropic has enabled its “network‑security verification” guard on Sonnet 5, similar to the guard in Opus 4.8, but less restrictive than the guard on Fable 5. The program is available on the native Claude platform, AWS‑hosted Claude, and Microsoft Foundry (Azure‑hosted), with Google Vertex support forthcoming.

The new tokenizer, akin to the change introduced in Opus 4.7, increases token count per input, which explains the promotional pricing aimed at keeping overall costs stable during the transition.

Rate limits were raised on 26 Apr 2026 for Sonnet and Haiku models and the Claude platform plans were simplified to Start, Build, and Scale tiers. Subsequent adjustments further increased limits for Chat, Cowork, Claude Code and the Claude platform to accommodate higher “effort” modes.

Developer feedback is mixed: Nicolas Bustamante praises Sonnet 5’s speed and agent optimizations, especially for browser use, noting a prompt‑injection success rate of 0.93 % versus 31.5 % for Opus 4.8 and 50.7 % for Sonnet 4.6. Other users consider the model too expensive, citing an average task cost of $2.29, roughly twice that of Sonnet 4.6.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI agents model comparison tokenizer Anthropic safety evaluation Claude Sonnet 5 cost-performance rate limits

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.