Claude Sonnet 4.5: The New State‑of‑the‑Art Coding Model with 30‑Hour Runtime
Anthropic’s Claude Sonnet 4.5, promoted as the world’s best coding model, achieves top scores on SWE‑bench Verified, runs continuously for over 30 hours, outperforms competitors on OSWorld and multiple agentic tests, adds extensive safety features, and introduces a revamped Claude Code suite with VS Code, terminal, and Agent SDK enhancements.
After DeepSeek released DeepSeek V3.2‑Exp on September 29, Anthropic unveiled Claude Sonnet 4.5, billed as “the world’s best coding model.”
Official tests show Sonnet 4.5 achieving top results on the SWE‑bench Verified benchmark, which measures real‑world coding ability. In practice, the model can stay focused on complex multi‑step tasks for more than 30 hours, a significant improvement over the previous Opus 4, which ran about 7 hours.
On the OSWorld computer‑operation benchmark, Sonnet 4.5 scored 61.4 %, surpassing the previous Sonnet 4’s 42.2 % lead.
The model’s capabilities are now integrated into Claude’s Chrome extension, allowing the AI to operate the browser directly—opening sites, filling forms, and completing tasks.
Safety improvements are highlighted: Sonnet 4.5 received the lowest safety‑risk score among major models, complies with AI Safety Level 3 (ASL‑3), and uses classifiers to filter dangerous inputs, especially those related to chemical, biological, radiological, or nuclear content. Mis‑fire rates have been reduced ten‑fold compared with the initial version and halved relative to Claude Opus 4.
Claude Code, the developer‑focused component, received a major upgrade:
Native VS Code plugin (Beta): Provides an inline diff view and a side‑panel for real‑time code suggestions.
Terminal UI upgrade: Adds clearer status indicators and searchable prompt history (Ctrl + R).
Claude Agent SDK: Exposes the core modules of Claude Code so developers can build custom agents, manage long‑running tasks, and coordinate sub‑agents.
Checkpoint feature: Automatically saves code state before each modification; users can revert with /rewind, supporting safe experimentation on large refactoring tasks.
Additional product updates include:
Claude API now supports context editing and memory tools for longer, more complex tasks.
The Claude app can run code and generate files (spreadsheets, slides, documents) directly in the conversation.
The Chrome extension is now available to Max users.
A limited‑time “Imagine with Claude” experiment lets users generate software on‑the‑fly without predefined functions.
Developers have begun testing the new model. One user created a simple 3D shooter in Three.js entirely generated by Sonnet 4.5, while another produced SVG graphics. Comparisons with GPT‑5 show Sonnet 4.5 leading in Agentic Coding, Agentic Tool Use, and reasoning tests.
Pricing for the Claude Sonnet 4.5 API remains unchanged at $3 per million input tokens and $15 per million output tokens.
Overall, Claude Sonnet 4.5 represents a significant leap in coding ability, safety, and developer tooling, sparking discussion about the accelerating “AI arms race” and its impact on software development.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
