Artificial Intelligence 7 min read

Claude 3.7 Sonnet: First Hybrid Reasoning Model with Enhanced Coding Tool and Strong Benchmark Performance

Claude 3.7 Sonnet, Anthropic's new hybrid reasoning model, introduces dual thinking modes, token‑based thinking budget control, unchanged pricing, and the Claude Code tool that automates lengthy coding tasks, while achieving record GPQA scores, superior video‑game testing results, and reduced unnecessary refusals on harmful requests.

DevOps

Feb 25, 2025

Claude 3.7 Sonnet: First Hybrid Reasoning Model with Enhanced Coding Tool and Strong Benchmark Performance

Anthropic announced Claude 3.7 Sonnet, the first hybrid reasoning model that combines a standard LLM with an extended‑thinking mode, allowing near‑real‑time responses and step‑by‑step reasoning for tasks such as mathematics, physics, instruction following, and coding.

The model offers two modes: a standard mode identical to Claude 3.5 Sonnet, and an extended mode that performs self‑reflection before answering, improving performance on complex tasks.

You can choose when the model answers normally and when it spends extra time thinking before responding.

API users can set a thinking token budget (any value) while the output limit remains 128 K tokens, enabling cost‑quality trade‑offs. Pricing stays the same at $3 per M input tokens and $15 per M output tokens, including thinking tokens.

Anthropic also released Claude Code, an early preview coding assistant that can search, read, edit, test, and push code to GitHub, reducing a typical 45‑minute manual workflow to a single command.

Benchmark results show Claude 3.7 Sonnet achieving 84.8% on the GPQA suite (96.5% on the physics sub‑score) using 256 independent samples and a 64‑token thinking budget, and outperforming previous models in video‑game testing, such as beating all three Pokémon gym leaders.

Safety improvements include a 45% reduction in unnecessary refusals for benign requests compared to prior versions.

The model is now available on major cloud platforms, including Amazon Bedrock and Google Cloud, with broader tool‑calling capabilities and ongoing enhancements planned for reliability, long‑running commands, and UI rendering.

Reference links: Anthropic announcement , Visible Extended Thinking , Extended Thinking Docs , Claude Code Overview .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI model Claude benchmark performance Coding tool GPQA Hybrid Reasoning

Written by

DevOps

Share premium content and events on trends, applications, and practices in development efficiency, AI and related technologies. The IDCF International DevOps Coach Federation trains end‑to‑end development‑efficiency talent, linking high‑performance organizations and individuals to achieve excellence.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.