Claude Opus 4.6 Launches with a Record 68% ARC‑AGI Score
Anthropic’s Claude Opus 4.6 launches with a 68% ARC‑AGI score, a 1 million‑token context window, top rankings on Terminal‑Bench 2.0, Humanity’s Last Exam, and GDPval‑AA, unchanged pricing, enhanced safety, and new API features such as adaptive thinking and context compression.
Core Capability Improvements
Opus 4.6 shows clear progress in programming tasks: better task planning, more reliable operation in large codebases, improved code review and debugging. It is the first Opus model to support a 1 million‑token context window (beta).
For everyday office work, the model can run financial analysis, conduct research, and create or edit documents, spreadsheets, and presentations. In a Cowork environment it can autonomously apply these skills for the user.
Benchmark Performance
Beyond the standout ARC‑AGI 2 result (68% score), Opus 4.6 leads industry‑wide evaluations:
Highest score on Terminal‑Bench 2.0 agent‑coding assessment.
Top placement on Humanity’s Last Exam complex multidisciplinary reasoning test.
In the GDPval‑AA economic‑value task, it outperforms the runner‑up OpenAI GPT‑5.2 by roughly 144 Elo points.
Some netizens remark that “the ARC‑AGI 2 score is insane; the field will saturate in months,” while others question whether these benchmarks truly measure meaningful ability.
Real‑World Feedback
Early‑access partners responded positively. Notion called it “Anthropic’s strongest model to date,” GitHub praised its performance on “complex multi‑step coding work,” and Replit described the agent‑planning improvements as a “huge leap.”
Pricing Remains Unchanged
Despite the performance boost, pricing is unchanged at $5 per million input tokens and $25 per million output tokens, leaving some users disappointed who expected lower costs or faster throughput.
Safety Performance
Anthropic emphasizes that the intelligence gains do not come at the expense of safety. In automated behavior audits, Opus 4.6 exhibits lower rates of misaligned actions such as deception, flattery, encouraging user delusions, and collaborative abuse.
New Developer Features
The API adds several capabilities:
Adaptive thinking – the model decides when deep reasoning is needed.
Effort control – four adjustable intelligence‑level options.
Context compression – automatic summarization and replacement of older context.
Support for 128 k output tokens.
Claude Opus 4.6 is now available via claude.ai, the API, and major cloud platforms, making it a noteworthy upgrade for users requiring complex tasks and long‑running agent work.
AI Engineering
Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
