Claude Opus 4.7 Unpacked: Engineering Boost, Vision Leap, and Safety Test
Claude Opus 4.7, Anthropic’s latest publicly released model, extends engineering intelligence with autonomous verification loops, upgrades visual resolution three‑fold, introduces layered safety deployment and new API controls, while benchmarked against GPT‑5.4 and Gemini 3.1, delivering record SWE‑bench scores and detailed real‑world security evaluations.
Release Overview
On 16 April 2026 Anthropic announced Claude Opus 4.7, the strongest Claude model openly available to the public. The launch marks simultaneous breakthroughs in engineering intelligence, high‑resolution visual understanding, and the first real‑world test of Anthropic’s AI safety deployment mechanisms.
The company also disclosed a more powerful internal model, Claude Mythos Preview, which surpasses Opus 4.7 on almost every measured dimension but remains limited to a handful of vetted partners (Apple, Google, Microsoft, etc.) for safety reasons.
This transparency signals a new layered security release system : Opus 4.7 serves as the public testbed for safety controls that will later protect Mythos‑level models.
模型版本时间线:
Opus 4.5 ──► Opus 4.6 ──► Opus 4.7
2025.11 2026.02 2026.04
↕ 约2个月 ↕ 约2个月Strategic Background
Anthropic’s Dual‑Track Strategy
In the same week as the Opus 4.7 release Anthropic unveiled Project Glasswing , a research framework that studies security risks of frontier AI models. Glasswing found that the Mythos Preview matches or exceeds top human security researchers in vulnerability discovery and exploitation.
Faced with the ethical dilemma of releasing a model capable of automated exploit generation, Anthropic adopted a three‑step solution:
Limit Mythos Preview to 11 rigorously vetted security firms for controlled vulnerability research.
Deploy Opus 4.7 with a “differential capability reduction” that suppresses dangerous attack abilities while preserving all other functions.
Establish a Cyber Verification Program that lets verified security researchers use the full capabilities of Mythos under strict supervision.
Thus Opus 4.7 functions both as a flagship consumer product and as the first real‑world validation point for Anthropic’s safety architecture.
Claude 模型层级(2026年4月)
┌─────────────────────────────────────┐
│ Claude Mythos Preview │ ← 仅限 Project Glasswing 合作伙伴
│ (最强能力,未公开发布) │
├─────────────────────────────────────┤
│ Claude Opus 4.7 │ ← 公开发布的最强模型 ★
│ (通用旗舰,全面开放) │
├─────────────────────────────────────┤
│ Claude Sonnet 4.6 │ ← 性价比主力模型
├─────────────────────────────────────┤
│ Claude Haiku 4.5 │ ← 轻量快速模型
└─────────────────────────────────────┘Opus 4.7 replaces Opus 4.6 as the default model for the claude‑opus API endpoint.
Core Technical Upgrades
1. Advanced Software‑Engineering Ability
Anthropic defines the improvement as a targeted reinforcement on “high‑difficulty, long‑running, supervised coding tasks.” Quantitative gains:
SWE‑bench Verified: 80.8 % → 87.6 % (+6.8 pp)
SWE‑bench Pro: – → 64.3 %
Internal 93‑item coding benchmark: baseline → +13 % (solves four tasks previous models could not)
2. Autonomous Verification Loop
The model now performs self‑checks before execution and validates its own outputs after generation. VentureBeat reported a concrete example: Opus 4.7 built a Rust text‑to‑speech engine, then fed the generated audio into an independent speech recognizer and compared the result with a Python reference implementation, dramatically reducing hallucination loops.
Planning‑stage logical self‑check.
End‑to‑end result verification.
Consistent state across multi‑hour long‑running tasks.
Users can now “hand off” complex, supervised coding jobs to the model with far fewer manual interventions.
3. High‑Resolution Multimodal Vision
Opus 4.7 is the first Claude model supporting high‑resolution image input.
图像处理能力对比
旧版 Claude 模型 Opus 4.7
长边最大分辨率: 1,568 像素 2,576 像素
最大像素总量: ~1.15 MP ~3.75 MP
提升幅度: 约 3.26 倍
坐标精度: 需要比例换算 1:1 直接映射Impactful scenarios:
Computer‑use & screenshot understanding : 1:1 pixel mapping eliminates the need for manual coordinate conversion, greatly improving UI automation precision.
Document analysis & OCR : Accurate recognition of tiny fonts, dense tables, and annotations in scanned legal, financial, or academic documents.
Scientific diagrams : Solve Intelligence reported reliable extraction of chemical structures and technical schematics, a breakthrough for life‑science patent workflows.
Precision perception tasks : Improved pointing, measuring, and counting accuracy.
Note: higher resolution consumes more tokens; Anthropic advises down‑sampling when fine detail is unnecessary.
4. Rigor‑Driven Autonomy
Anthropic repeatedly uses the term “rigor” to describe Opus 4.7’s behavior. The model follows a pause‑plan‑verify execution paradigm, persists through errors (Devin platform notes hours‑long coherent operation), and extends long‑horizon autonomy beyond the 14 h 30 min benchmark of Opus 4.6.
5. Precise Instruction Following
Switch from “intent inference” to literal execution: the model now obeys exactly what is written in the prompt, which improves reliability for API integration, automation pipelines, and strict formatting tasks, but requires clearer prompt engineering.
6. Cross‑Session File‑System Memory
Opus 4.7 can write key context (project background, user preferences) to a structured file system after a session and automatically retrieve it in subsequent sessions, enabling truly long‑term AI work partners. Example: during a large code‑refactor the model remembers coding style, completed modules, technical debt list, and prior architectural decisions without the user re‑describing them.
7. Creative & Professional Output Quality
Anthropic describes the model as “more tasteful,” producing UI/UX prototypes, slide decks, and technical documentation that approach the quality of experienced designers or copywriters. This capability underlies the new product Claude Design (see Section 9).
Benchmark Deep‑Dive
Software & Coding
SWE‑bench Verified: 87.6 % (industry‑highest public score, Mythos Preview excluded).
SWE‑bench Pro: 64.3 %.
Internal 93‑item benchmark: +13 % vs. Opus 4.6.
Tool Calling & Agents
MCP‑Atlas (multi‑step tool calling): 77.3 % vs. GPT‑5.4 Pro 68.1 % – clear advantage.
BrowseComp (web research): 79.3 % vs. GPT‑5.4 Pro 89.3 % – GPT‑5.4 leads for heavy web‑retrieval workloads.
Finance Agent and GDPval‑AA: Opus 4.7 leads (no public competitors listed).
Multidisciplinary Reasoning
Humanity’s Last Exam (no tools): Opus 4.7 beats all models except Mythos (64.7 %).
Humanity’s Last Exam (with tools): GPT‑5.4 Pro 58.7 % slightly ahead of Opus 4.7’s 54.7 %.
Safety & Alignment
Overall alignment comparable to Opus 4.6 (“largely well‑aligned and trustworthy”).
Honesty: improved.
Prompt‑injection resistance: improved.
Hallucination rate: reduced.
Reward‑hacking: reduced.
Minor regression on controlled‑substance advice.
能力维度雷达(相对优势示意)
代码工程
↑
安全对齐 ← → 工具调用
↓
视觉理解 研究能力
Opus 4.7:代码工程 ★★★★★,工具调用 ★★★★★,视觉理解 ★★★★★
GPT-5.4 Pro:研究能力 ★★★★★,网络浏览 ★★★★★
Gemini 3.1 Pro:多语言 ★★★★,代码 ★★★★Developer Tools & API New Features
Effort Levels
Anthropic added a fine‑grained “Effort” control that balances reasoning depth against latency and token cost.
努力层级梯度(从低到高)
┌──────────┬──────────┬──────────┬──────────┐
│ low │ high │ xhigh │ max │
│ 快速响应 │ 标准推理 │ 深度推理 │ 极限推理 │
│ 低成本 │ 均衡 │ ★新增 │ 最高质量 │
└──────────┴──────────┴──────────┴──────────┘
↑
Opus 4.7 新增的 "extra high" 层级The xhigh tier offers near‑max quality with significantly lower latency and token consumption than max. Anthropic recommends starting with high or xhigh for most coding and agent tasks and only switching to max when extreme performance is required.
Task Budgets (Public Beta)
Developers can set a total token ceiling for an entire agent loop (thinking, tool calls, returns, final output). The model monitors the remaining budget and, when the limit approaches, prioritises the most critical sub‑tasks and gracefully finishes the current step instead of being abruptly cut off.
Cost control – precise token‑spend limits.
SLA assurance – guarantees completion within budget.
Priority management – model auto‑weights tasks under pressure.
/ultrareview Command (Claude Code)
When invoked, Claude Code launches an isolated code‑review session, reads the full change set, simulates a meticulous reviewer, and outputs a structured report covering potential bugs, logic flaws, performance risks, security issues, and style problems. This turns manual line‑by‑line checks into an automated CI/CD step.
Tokenizer Changes
Token inflation factor: 1.0 – 1.35× the previous tokenizer (Chinese text inflates less, code inflates more).
Higher effort levels generate additional tokens due to deeper reasoning.
Migration advice: test real‑traffic samples before switching from Opus 4.6 to Opus 4.7 to avoid unexpected budget overruns.
API Breaking Changes
temperature, top_p, top_k are no longer supported (return 400).
Extended Thinking Budget renamed to Adaptive Thinking .
Suggested migration path: remove sampling parameters, adopt Adaptive Thinking, and rely on prompt‑engineering for output style control.
Safety Architecture: New Frontiers in Cybersecurity
Differential Capability Reduction
During training Anthropic selectively attenuated the model’s ability to generate malicious code or exploit software vulnerabilities while preserving all other competencies. This “differential” approach attempts to draw a line between legitimate security research and harmful attack assistance.
Real‑Time Request Detection & Interception
请求处理流程(网络安全相关请求)
用户请求
↓
实时内容分析层
↓
┌──────────────────────────────────┐
│ 请求风险分类 │
│ │
│ 合法安全研究 ──► 正常响应 │
│ 模糊/边界情况 ──► 谨慎响应 + 标记│
│ 高风险攻击性请求 ──► 自动拒绝 │
└──────────────────────────────────┘
↓
响应输出 / 拦截记录
↓
Anthropic 安全团队反馈循环The system logs rejected high‑risk requests and feeds them back to the security team for continuous improvement.
Cyber Verification Program
Eligible participants: vulnerability researchers, red‑team members, security‑tool developers.
Application via Anthropic’s official channel with identity verification.
Approved users gain controlled access to the full security capabilities of Opus 4.7 for legitimate testing.
All activity is audited to prevent abuse.
Derived Product – Claude Design
Released alongside Opus 4.7, Claude Design is a research‑preview design tool that materialises the model’s upgraded visual and aesthetic abilities.
Automatic design‑system generation : scans a codebase and design assets to build a team‑specific style guide (colors, typography, component library, spacing, corner radius).
Multi‑source input : accepts text prompts, image uploads (screenshots, mockups), document uploads (DOCX, PPTX, XLSX), code‑repo references, and a web‑capture tool that pulls visual elements directly from live sites.
Collaboration & export : supports real‑time multi‑user editing, parallel management of multiple design systems, and export to common design formats.
Availability: requires Pro/Max/Team/Enterprise subscription; Enterprise defaults to disabled and must be enabled by an admin; rolled out gradually across platforms (Claude app, Mac client, Claude Code, Claude Cowork).
Enterprise Applications & User Feedback
FinTech platform : praised the model’s planning‑stage self‑correction and execution speed, noting a tangible increase in development delivery frequency.
Data‑analysis platform Hex : highlighted honest handling of missing data, resistance to “inconsistent data traps,” and a 13 % improvement on a 93‑item coding benchmark, solving tasks that prior models could not.
Life‑science company Solve Intelligence : reported accurate recognition of chemical structures and complex technical diagrams, enabling a next‑generation patent‑workflow tool.
AI‑agent platform Devin : observed sustained coherence over multi‑hour autonomous runs, unlocking deep‑investigation tasks previously unreliable.
Enterprise cloud content manager Box : noted that low‑effort Opus 4.7 matches the quality of mid‑effort Opus 4.6, effectively lowering token costs for comparable output.
Anthropic Research Ecosystem & Long‑Term Vision
Four core research pillars shape Opus 4.7:
Interpretability : recent papers on emotional concepts and self‑inspection inform the model’s reduced hallucination and increased self‑checking.
Alignment : studies on alignment “pseudo‑masking” guided the honesty and reward‑hacking mitigations.
Societal Impact : labor‑market impact analyses drove prioritisation of high‑value capabilities (coding, finance, long‑task execution).
Scientific Computing : research on long‑running Claude for scientific workloads motivated the high‑resolution visual upgrades.
These research directions directly translate into the concrete improvements described above.
Summary & Outlook
Claude Opus 4.7 is more than a version bump; it represents three synchronized advances:
Engineering intelligence maturity : autonomous verification loops and record SWE‑bench performance move the model from a coding aid to a reliable work partner.
Systematic safety deployment : Project Glasswing, differential capability reduction, real‑time interception, and the Cyber Verification Program constitute Anthropic’s first end‑to‑end safety framework, with Opus 4.7 as the inaugural real‑world test.
Product ecosystem extension : Claude Design demonstrates how foundational model upgrades can be productised into vertical tools, signalling a shift from generic chatbots to specialised AI assistants.
The system card explicitly states that Opus 4.7 is still outperformed by the unreleased Mythos Preview, underscoring Mythos as the ultimate target. Opus 4.7’s wide deployment will validate whether the safety mechanisms scale, paving the way for future public releases of Mythos‑level capabilities.
Architect's Must-Have
Professional architects sharing high‑quality architecture insights. Covers high‑availability, high‑performance, high‑stability designs, big data, machine learning, Java, system, distributed and AI architectures, plus internet‑driven architectural adjustments and large‑scale practice. Open to idea‑driven, sharing architects for exchange and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
