Claude Opus 4.7: Bigger Context, Sharper Code, Triple‑Resolution Images, and New Security Controls

Claude Opus 4.7, the strongest publicly available Opus model, boosts code task success rates, extends image resolution three‑fold, adds an xhigh effort tier, introduces proactive network‑security interception, and retains the same pricing, while benchmark tests show it outpacing Opus 4.6, GPT‑5.4 and Gemini 3.1 Pro across multiple metrics.

ShiZhen AI
ShiZhen AI
ShiZhen AI
Claude Opus 4.7: Bigger Context, Sharper Code, Triple‑Resolution Images, and New Security Controls

Release Context

Anthropic announced Claude Opus 4.7 today. It is the most capable model in the Opus series that is publicly available, positioned below the limited‑access Claude Mythos Preview but above all other Opus versions.

Code and Long‑Task Improvements

The model focuses on handling complex, long‑running tasks reliably, with three main enhancements:

Task Integrity – Long workflows no longer abort prematurely; tool errors are reduced by two‑thirds in Notion Agent’s hidden‑requirement tests.

Self‑Verification – The model checks its own logic before responding, correctly reporting missing data and avoiding “plausible‑but‑wrong” answers that earlier versions produced.

Execution Precision – Prompts are followed more literally, which may require prompt adjustments for legacy workflows.

Internal benchmark data from several platforms show substantial gains:

CursorBench pass rate: 58% → 70%

Rakuten production‑task throughput: 3× Opus 4.6

Notion Agent multi‑step workflow: +14% success, fewer tokens

Factory Droids enterprise engineering tasks: +10‑15% success rate

CodeRabbit code‑review recall: >10% improvement

These figures come from independent internal tests on each platform, not Anthropic‑run benchmarks.

Benchmark Comparison Across Four Dimensions

Official benchmark charts compare Opus 4.7 with Opus 4.6, GPT‑5.4 and Gemini 3.1 Pro:

Knowledge Work (GDPVal‑AA) – Opus 4.7 scores 1753 vs. 1619 (4.6), 1674 (GPT‑5.4), 1314 (Gemini 3.1 Pro).

Document Reasoning (OfficeQA Pro) – Accuracy 80.6% vs. 57.1% (4.6), 51.1% (GPT‑5.4), 42.9% (Gemini 3.1 Pro).

Long‑Context Reasoning (GraphWalks 1M) – 58.6% vs. 41.2% (4.6), a >17‑point gain.

Visual Navigation (ScreenSpot‑Pro) – High‑resolution mode reaches 87.6% (with tool assistance) vs. 83.1% (4.6 low‑res).

Visual Capability: Triple Image Resolution

The maximum accepted image size jumps to a 2576‑pixel long side (≈3.75 MP), more than three times the previous limit. This change is model‑side only; developers simply send higher‑resolution images, though token consumption rises accordingly.

In a penetration‑testing scenario (XBOW), visual accuracy rose from 54.5% to 98.5%.

New Features

xhigh effort level – Added between high and max effort tiers. Claude Code now defaults to xhigh, and Anthropic recommends starting from high or xhigh for code and agent workloads.

Task budgets (API beta) – Developers can set a token budget for long‑running agent tasks, preventing premature token exhaustion.

Claude Code /ultrareview – A new slash command launches a dedicated review session that scans code changes for bugs and design issues; Pro and Max users receive three free trials.

Auto mode, which lets Claude make permission decisions to reduce interruptions, is now available to Max users.

Security Mechanism: Proactive Network‑Security Interception

Anthropic released Project Glasswing, a study on AI‑related security risks. Before opening Mythos Preview widely, they tested a new safety‑interception system on Opus 4.7, deliberately suppressing its network‑security capabilities during training and automatically blocking high‑risk requests.

Safety scores show Opus 4.7 is more honest and resistant to malicious prompt injection than 4.6, though it is slightly weaker on overly detailed illicit‑substance advice.

Researchers can apply for the Cyber Verification Program to use Opus 4.7 for legitimate security research.

Community Reaction

Comments are split: some suspect a “soft downgrade” because Opus 4.6 performance appeared to decline after a certain point, while others point to independent internal data that consistently shows Opus 4.7’s superiority.

Migration Considerations

Two factors affect token consumption:

Tokenizer update – The new tokenizer can map the same text to 1.0–1.35× more tokens, varying by language and content type.

High‑effort inference – In later agent dialogue rounds, higher effort levels generate more reasoning tokens.

Anthropic claims overall token efficiency improves due to higher accuracy and less wasted output, but recommends real‑traffic testing.

Pricing remains unchanged: $5 per M input tokens, $25 per M output tokens, same as Opus 4.6.

Model identifier:

claude-opus-4-7

Availability

Claude.ai full product line (including Claude Code, default xhigh effort)

Anthropic API

Amazon Bedrock

Google Cloud Vertex AI

Microsoft Foundry

Cursor (with limited‑time 50% discount)

Claude Opus 4.7 announcement
Claude Opus 4.7 announcement
code generationAIsecuritybenchmarkClaudeOpus 4.7
ShiZhen AI
Written by

ShiZhen AI

Tech blogger with over 10 years of experience at leading tech firms, AI efficiency and delivery expert focusing on AI productivity. Covers tech gadgets, AI-driven efficiency, and leisure— AI leisure community. 🛰 szzdzhp001

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.