Artificial Intelligence 8 min read

Can Opus + Sonnet Advisor Cut Costs While Raising AI Benchmark Scores?

Anthropic’s new advisor strategy lets the cheaper Opus model act as a consultant for Sonnet or Haiku, delivering higher benchmark scores—e.g., SWE‑bench Multilingual up to 74.8% and BrowseComp up to 41.2%—while reducing per‑task cost to about 15% of solo runs, though it introduces trade‑offs such as the need for the executor to recognize when to ask for advice and potential vendor lock‑in.

AI Insight Log

Apr 11, 2026

Can Opus + Sonnet Advisor Cut Costs While Raising AI Benchmark Scores?

Anthropic recently announced an advisor strategy that pairs a high‑cost model (Sonnet 4.6 or Haiku) with a cheaper “advisor” model (Opus 4.6). The advisor only intervenes when the executor model is uncertain, acting like a military strategist who speaks only at critical moments.

In this architecture, the executor (Sonnet or Haiku) handles the full task—reading files, invoking tools, writing code—while Opus never performs work directly. When the executor reaches a decision point it cannot resolve, it sends a short request (≈400‑700 tokens) to Opus for guidance, then continues using the advice.

Because Sonnet often wastes tokens by trial‑and‑error on complex problems, the advisor’s guidance reduces unnecessary token consumption. The saved tokens frequently outweigh the cost of a single Opus call.

Empirical results show the benefit. On the SWE‑bench Multilingual benchmark, Sonnet 4.6 alone scored 72.1% with a per‑task cost of $1.09. Adding Opus as an advisor raised the score to 74.8% and lowered the cost to $0.96.

More strikingly, on the BrowseComp benchmark, Haiku alone achieved 19.7% while Haiku + Opus reached 41.2%—more than double the score. The combined cost is roughly 15% of a Sonnet‑solo run, making it one of the most cost‑effective configurations for workloads with many repetitive tasks and a few critical decision points.

Integration is straightforward: add a special tool declaration to the Messages API request. The example below shows the required JSON payload, including the max_uses parameter that caps how many times the advisor can be consulted per request.

response = client.messages.create(
    model="claude-sonnet-4-6",
    tools=[
        {
            "type": "advisor_20260301",
            "name": "advisor",
            "model": "claude-opus-4-6",
            "max_uses": 3,
        },
        # other tools go here
    ],
    messages=[...]
)

The max_uses setting is crucial. It prevents the executor from over‑consulting Opus on trivial issues, which would raise costs, and serves as a budget‑control lever. Anthropic does not prescribe a fixed value; users must tune it based on task complexity.

Several limitations are noted:

The executor must reliably detect when a problem exceeds its capability; mis‑judgment either reduces the benefit or inflates cost.

For simple tasks where Sonnet or Haiku already succeeds, adding an advisor incurs pure overhead.

The feature is still in beta, so edge cases and stability may evolve.

Embedding the advisor_20260301 tool ties the application to Anthropic’s ecosystem, making future vendor switches more involved.

Overall, the advisor strategy is an engineering technique that lets developers avoid using an expensive model for work a cheaper model can handle, while still gaining higher accuracy on challenging steps. It reflects Anthropic’s broader push to make model‑layered collaboration a built‑in infrastructure, lowering the barrier to multi‑model workflows but also abstracting away some of the underlying control.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cost optimization benchmark multi-agent Claude Anthropic advisor strategy Opus Haiku

Written by

AI Insight Log

Focused on sharing: AI programming | Agents | Tools

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.