Claude Opus 4.8 Review: Why Programming Still Leads and How It Manages Hundreds of Sub‑Agents
Claude Opus 4.8 improves judgment, honesty about progress, and long‑running autonomy while keeping the same price, outperforms rivals on code, reasoning and knowledge‑work benchmarks, introduces a 2.5× faster “Fast mode” and a research‑preview dynamic workflow that can orchestrate hundreds of sub‑agents in parallel.
Anthropic has launched Claude Opus 4.8, building on 4.7 with stronger judgment, greater honesty about its own progress, and the ability to work independently for longer periods, all at the same price as the previous version.
Three key traits, especially "honesty"
The model’s improved judgment and extended autonomy are straightforward, but the middle trait—being more honest about its progress—is crucial for long‑running Agent tasks. In practice, earlier agents often claimed completion while leaving files unchanged or skipping tests, forcing users to monitor every step. By explicitly evaluating its own progress, Opus 4.8 can admit uncertainty, making it far more reliable for multi‑step workflows.
Benchmark snapshot: programming still leads, but one area lags
Official comparison tables pit Opus 4.8 against its predecessor, OpenAI’s GPT‑5.5 and Google’s Gemini 3.1 Pro. Highlights include:
Code ability (SWE‑Bench Pro) : 69.2% vs 64.3% (4.7), beating GPT‑5.5’s 58.6% and Gemini’s 54.2%.
Multidisciplinary reasoning (Humanity’s Last Exam) : 49.8% without tools, 57.9% with tools – highest among the four models.
OS operations (OSWorld‑Verified) : 83.4%, a slight edge over 4.7’s 82.8%.
Knowledge work (GDPval‑AA) : 1890 points, up from 1753.
Finance analysis (Finance Agent v2) : 53.9%, again the top score.
Terminal‑Bench 2.1 : GPT‑5.5 scores 78.2% while Opus 4.8 reaches 74.6%, the only category where it falls behind.
The author warns against assuming a blanket “total domination” verdict; Opus 4.8 excels in coding, reasoning and knowledge work but still trails in pure terminal command scenarios.
Fast mode: 2.5× speed, one‑third cost
A new Fast mode runs the same Opus 4.8 model but speeds token generation about 2.5× while reducing cost to roughly one‑third. Users can enable it in Claude Code with the /fast command; API access currently requires a request to an account manager or a wait‑list.
Claude Code feels like an unattended engineer
In Claude Code, Opus 4.8 behaves like an experienced engineer who can follow a long conversation without constant confirmation. It can take a whole feature or a bug‑fix cycle, keep the task on track, and let the user focus on other work.
Dynamic workflows: planning then dispatching hundreds of sub‑agents
The research‑preview dynamic workflow lets Claude first create a plan, then generate a script that launches dozens to hundreds of sub‑agents in parallel, each handling a piece of a complex engineering problem. After parallel execution, the system performs an independent verification before returning results.
Official use cases include cross‑service code‑base bug hunting and security audits, large‑scale migrations involving thousands of files, and critical changes that require multi‑round validation.
A concrete example is the Bun rewrite: Jarred Sumner migrated Bun from Zig to Rust, achieving 99.8% test‑suite pass rate on ~750 k lines of Rust code in 11 days. Anthropic cites this to demonstrate the scale the dynamic workflow can handle.
The main drawback is cost: dynamic workflows consume significantly more tokens because hundreds of sub‑agents run simultaneously, leading to higher bills. The feature is currently available in CLI, desktop, VS Code extension, and cloud platforms for Max, Team, and Enterprise tiers, but remains a research preview, not a universally stable offering.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
