Claude Opus 4.7 Launch: Massive Coding Gains and New Auto‑Mode Tips

Anthropic’s Claude Opus 4.7 arrives with a 11‑point jump on SWE‑bench Pro, a 24‑point rise on SWE‑bench Verified, three‑fold productivity boosts for some users, new visual resolution, and six practical Claude Code tips, while still lagging on certain search‑related benchmarks.

Node.js Tech Stack
Node.js Tech Stack
Node.js Tech Stack
Claude Opus 4.7 Launch: Massive Coding Gains and New Auto‑Mode Tips

Benchmark: Coding Metrics Jump

Hard data shows Opus 4.7 achieving 64.3% on SWE‑bench Pro, up from 53.4% in 4.6 – an 11‑point increase. Competing models GPT‑5.4 (57.7%) and Gemini 3.1 Pro (54.2%) fall behind. On SWE‑bench Verified, Opus 4.7 reaches 87.6%.

Third‑party tests confirm the trend: Cursor CEO Michael Truell reports CursorBench scores rising from 58% to 70% (the clearest jump on their coding benchmark), Rakuten sees a three‑fold increase in production task completion, and CodeRabbit’s code‑review recall improves by over 10%.

Warp founder Zach Lloyd notes that a concurrency bug unsolvable in 4.6 is fixed in 4.7, while Devin CEO Scott Wu says the model can now work “coherently for hours,” marking a qualitative leap in deep‑investigation ability.

Other feedback highlights a 13% coding‑benchmark lift reported by Scale AI CPO Mario Rodriguez (including four tasks previously impossible), a 14% multi‑step workflow improvement and a one‑third reduction in tool‑call errors from Notion AI’s Sarah Sachs, faster development cycles cited by Stripe VP Clarence Huang, and a 21% drop in document‑reasoning errors noted by Databricks CTO Hanlin Tang. Qodo CEO Itamar Friedman adds that Opus 4.7 passed three TBench tasks and uncovered competitor‑missed issues.

However, Mythos Preview still outperforms Opus 4.7 on several metrics, such as 93.9% on SWE‑bench Verified and 82.0% on Agentic terminal coding, indicating Opus 4.7 is not the absolute strongest across all tasks but represents a major step within Anthropic’s lineup.

Six Practical Claude Code Tips from Founder Boris Cherny

Boris Cherny, who has been using Opus 4.7 for daily development, shares six concrete suggestions.

1. Enable Auto mode to skip permission pop‑ups – long‑running tasks like deep research or full‑module refactoring can now run unattended. Switch to Auto mode with the shortcut Shift+Cmd+M (choose option 3) and watch only the results.

2. Use /fewer-permission-prompts to reduce interruptions – the new skill scans past sessions, identifies repeatedly safe bash commands and MCP calls, and adds them to an allow‑list, eliminating repeated confirmations after a single run.

3. Recaps: resume after a break – when an Agent runs for minutes or hours and you step away, Recaps provides a brief summary of what was done and the next step, e.g., “Fixing the post‑submit transcript shift bug. The styling‑flash part is shipped as PR #29869.”

4. Focus mode: hide intermediate steps – only the final output is shown, reflecting Cherny’s growing trust in the model’s internal reasoning.

5. Adjust effort level to balance speed and intelligence – Opus 4.7 now uses adaptive thinking with five effort tiers (low, medium, high, xhigh, max). The new “xhigh” tier sits between high and max, suitable for deep thinking without excessive latency.

6. Verify the model’s work – ensure Claude can self‑check: run tests for backend tasks, capture screenshots for frontend work, or execute validation scripts for data tasks. This can boost output quality by 2–3×.

Vision and Other Capability Upgrades

Beyond coding, visual resolution triples to a maximum side length of 2,576 px (~3.75 M pixels), raising XBOW visual accuracy from 54.5% to 98.5%. Vercel CEO Aj Orbach calls the dashboard and data‑interface building performance “the strongest he’s ever seen.”

Instruction following is now more precise, reducing the need for prompt tweaking when building system prompts or Agent workflows.

Task budgets enter public testing, allowing a token‑consumption ceiling per task for better cost control.

Claude Code adds a new /ultrareview command to launch a dedicated code‑review session that reads all changes and points out issues like a diligent reviewer.

Things to Watch

Two caveats: the new tokenizer can increase token count by 1.0–1.35× for the same input, affecting cost‑sensitive applications; and Opus 4.7 lags on some dimensions – Agentic search drops to 79.3% (down from 83.7% in 4.6) and multidisciplinary reasoning scores 54.7% versus GPT‑S.4’s 58.7%.

Its strengths are clear: coding, vision, and tool calling, while competitors remain competitive on knowledge‑reasoning and search tasks.

Pricing and Availability

Pricing is unchanged at $25 per million input tokens, identical to Opus 4.6. The API model ID is claude-opus-4-7. The model is fully available on claude.ai, Claude Platform, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

Final Thoughts

From a rumor yesterday to an official launch today, Anthropic’s release cadence has accelerated: Opus 4.6 in February, Opus 4.7 in April, with SWE‑bench Pro jumping from 53.4% to 64.3% and visual accuracy more than doubling.

As Cherny summed up, “If you keep your old workflow, Opus 4.7 is a solid upgrade; but if you adapt your workflow to its longer‑running, more agentic nature, it becomes a significant leap.” For developers who rely on Claude Code daily, the upgrade merits serious attention: the model is stronger and the surrounding toolchain—Auto mode, Recaps, Focus mode, effort levels—shifts effort from watching terminals to higher‑level judgment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

benchmarkAI coding modelAuto ModeClaude Opus 4.7visual resolutionClaude Code tips
Node.js Tech Stack
Written by

Node.js Tech Stack

Focused on sharing AI, programming, and overseas expansion

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.