Artificial Intelligence 5 min read

Claude Opus 4.6 Launches with a Record 68% ARC‑AGI Score

Anthropic’s Claude Opus 4.6 launches with a 68% ARC‑AGI score, a 1 million‑token context window, top rankings on Terminal‑Bench 2.0, Humanity’s Last Exam, and GDPval‑AA, unchanged pricing, enhanced safety, and new API features such as adaptive thinking and context compression.

AI Engineering

Feb 5, 2026

Claude Opus 4.6 Launches with a Record 68% ARC‑AGI Score

Core Capability Improvements

Opus 4.6 shows clear progress in programming tasks: better task planning, more reliable operation in large codebases, improved code review and debugging. It is the first Opus model to support a 1 million‑token context window (beta).

For everyday office work, the model can run financial analysis, conduct research, and create or edit documents, spreadsheets, and presentations. In a Cowork environment it can autonomously apply these skills for the user.

Benchmark Performance

Beyond the standout ARC‑AGI 2 result (68% score), Opus 4.6 leads industry‑wide evaluations:

Highest score on Terminal‑Bench 2.0 agent‑coding assessment.

Top placement on Humanity’s Last Exam complex multidisciplinary reasoning test.

In the GDPval‑AA economic‑value task, it outperforms the runner‑up OpenAI GPT‑5.2 by roughly 144 Elo points.

Some netizens remark that “the ARC‑AGI 2 score is insane; the field will saturate in months,” while others question whether these benchmarks truly measure meaningful ability.

Real‑World Feedback

Early‑access partners responded positively. Notion called it “Anthropic’s strongest model to date,” GitHub praised its performance on “complex multi‑step coding work,” and Replit described the agent‑planning improvements as a “huge leap.”

Pricing Remains Unchanged

Despite the performance boost, pricing is unchanged at $5 per million input tokens and $25 per million output tokens, leaving some users disappointed who expected lower costs or faster throughput.

Safety Performance

Anthropic emphasizes that the intelligence gains do not come at the expense of safety. In automated behavior audits, Opus 4.6 exhibits lower rates of misaligned actions such as deception, flattery, encouraging user delusions, and collaborative abuse.

New Developer Features

The API adds several capabilities:

Adaptive thinking – the model decides when deep reasoning is needed.

Effort control – four adjustable intelligence‑level options.

Context compression – automatic summarization and replacement of older context.

Support for 128 k output tokens.

Claude Opus 4.6 is now available via claude.ai, the API, and major cloud platforms, making it a noteworthy upgrade for users requiring complex tasks and long‑running agent work.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

benchmark AI model safety pricing Anthropic Claude Opus ARC‑AGI

Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.