When and How to Use Multi‑Agent LLM Systems: Practical Insights from Anthropic

The article explains when multi‑agent LLM architectures outperform single‑agent setups—highlighting context pollution, parallelizable tasks, and specialization—while detailing the orchestrator‑subagent pattern, design trade‑offs, code examples, and verification strategies. It also provides practical signals for abandoning single‑agent designs, recommends context‑centric decomposition, and warns about token overhead and early‑victory verification pitfalls.

AI Tech Publishing
AI Tech Publishing
AI Tech Publishing
When and How to Use Multi‑Agent LLM Systems: Practical Insights from Anthropic

1. Why Consider Single‑Agent First

A well‑designed single agent equipped with appropriate tools often exceeds expectations. Adding agents introduces extra failure points, additional prompts to maintain, and potential unexpected behavior. In tests, multi‑agent implementations consumed 3 to 10 times more tokens than comparable single‑agent solutions because of duplicated context, coordination messages, and result summarisation.

2. Decision Framework for Multi‑Agent Systems

Multi‑agent architectures add value only when they resolve constraints that single agents cannot overcome. The following patterns consistently deliver positive ROI.

2.1 Context Protection

LLM context windows are limited; irrelevant information accumulated in a single agent’s context leads to context pollution . Isolating sub‑agents keeps each task’s context clean.

Single‑Agent (context stacking) example:

# Single agent stacks everything in the context
conversation_history = [
  {"role": "user", "content": "My order #12345 is not working"},
  {"role": "assistant", "content": "Let me look up the order..."},
  # Tool execution adds 2000+ tokens of order history
  {"role": "user", "content": "... (order details, purchase history, logistics) ..."},
  {"role": "assistant", "content": "Now I will diagnose the technical issue..."}
]

Multi‑Agent (context isolation) example:

from anthropic import Anthropic

client = Anthropic()

class OrderLookupAgent:
    def lookup_order(self, order_id: str) -> dict:
        # Independent agent with its own context
        messages = [{"role": "user", "content": f"Get core details for order {order_id}"}]
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            messages=messages,
            tools=[get_order_details_tool]
        )
        return extract_summary(response)

class SupportAgent:
    def handle_issue(self, user_message: str):
        if needs_order_info(user_message):
            order_id = extract_order_id(user_message)
            order_summary = OrderLookupAgent().lookup_order(order_id)
            context = f"Order {order_id}: status {order_summary['status']}, date {order_summary['date']}"
            messages = [{"role": "user", "content": f"{context}

User issue: {user_message}"}]
            response = client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=2048,
                messages=messages
            )
            return response

The main agent receives only the 50‑100 tokens it truly needs, keeping its context focused.

2.2 Parallelization

Running several agents concurrently expands the search space, which is especially valuable for research‑type queries. Anthropic’s “Research” feature decomposes a query into independent facets, launches sub‑agents for each facet, and then synthesises the findings.

import asyncio
from anthropic import AsyncAnthropic

client = AsyncAnthropic()

async def research_topic(query: str) -> dict:
    facets = await lead_agent.decompose_query(query)
    tasks = [research_subagent(facet) for facet in facets]
    results = await asyncio.gather(*tasks)
    return await lead_agent.synthesize(results)

async def research_subagent(facet: str) -> dict:
    """Each sub‑agent has an independent context window"""
    messages = [{"role": "user", "content": f"Research: {facet}"}]
    response = await client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=4096,
        messages=messages,
        tools=[web_search, read_document]
    )
    return extract_findings(response)

Parallelism can improve accuracy but typically raises token consumption to 3‑10× that of a single‑agent approach because each agent maintains its own context and must exchange coordination messages.

2.3 Specialization

Different tasks often require distinct roles, constraints, or toolsets. Splitting a monolithic agent into specialized sub‑agents with tailored system prompts yields more consistent results.

2.3.1 System‑Prompt Specialization

Customer‑support agents need empathy, code‑review agents need precision, compliance agents need strict rule adherence, and brainstorming agents need creativity. Separate agents avoid conflicts between these behaviors.

2.3.2 Domain‑Expert Specialization

Complex domains such as law or medicine benefit from agents that carry focused domain knowledge rather than a single generic model.

2.3.3 Tool‑Selection Specialization

from anthropic import Anthropic

client = Anthropic()

class CRMAgent:
    """Handles CRM operations"""
    system_prompt = """You are a CRM expert. Manage opportunities and account records. Verify records before updating and maintain data consistency."""
    tools = [crm_get_contacts, crm_create_opportunity]  # 8‑10 CRM‑specific tools

class MarketingAgent:
    """Handles marketing automation"""
    # 8‑10 marketing‑specific tools

class OrchestratorAgent:
    """Routes requests to the appropriate specialist"""
    def execute(self, user_request: str):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            system="""You coordinate platform integration.
- CRM: contacts, opportunities, accounts, pipeline
- Marketing: campaigns, lead nurturing, email sequences
- Messaging: notifications, alerts, team communication""",
            messages=[{"role": "user", "content": user_request}],
            tools=[delegate_to_crm, delegate_to_marketing, delegate_to_messaging]
        )
        return response

Specialization works best when domain boundaries are clear and routing decisions are unambiguous, though it adds prompt‑maintenance overhead.

3. Signals That a Single‑Agent Architecture Is Failing

Approaching context limits : Frequent large‑context usage with degrading performance.

Managing too many tools : More than 15‑20 tools cause the model to spend excessive context and attention on option selection; consider a tool‑search utility first.

Task‑focus difficulty : Missed instructions or step confusion in complex workflows suggests splitting into narrower‑responsibility sub‑agents.

4. Context‑Centric Decomposition

The key design decision is how to allocate work among agents. Teams often err by using a problem‑centric split (e.g., separate agents for implementation, testing, review), which creates continual hand‑off loss. Instead, adopt a context‑centric view: an agent that owns a piece of context should also handle any work that relies on that context.

Independent research paths : Parallel investigation of “Asian market trends” vs. “European market trends” without shared context.

Components with clear interfaces : Front‑end and back‑end modules can run in parallel when APIs are well defined.

Black‑box validators : A validator that only runs tests and reports results needs no implementation context.

5. Verification Sub‑Agent Pattern

A verification sub‑agent is a dedicated agent whose sole responsibility is to test or verify the main agent’s output. Strong orchestrator models (e.g., Claude 3.5 Opus) can self‑evaluate, but with weaker orchestrators or when explicit checkpoints are required, a verification sub‑agent remains valuable. Verification sub‑agents avoid the “telephone game” problem by operating on minimal context: the requirement, the output, and the verification rules.

from anthropic import Anthropic

client = Anthropic()

class CodingAgent:
    """Main agent implements a feature"""
    def implement_feature(self, requirements: str) -> dict:
        messages = [{"role": "user", "content": f"Implement feature: {requirements}"}]
        response = client.messages.create(model="claude-sonnet-4-5", max_tokens=4096, messages=messages)
        return {"code": response.content, "files_changed": extract_files(response)}

class VerificationAgent:
    """Independent agent validates the implementation"""
    def verify_implementation(self, requirements: str, files_changed: list) -> dict:
        messages = [{"role": "user", "content": f"""
Requirements: {requirements}
Modified files: {files_changed}
Run the full test suite and verify:
1. All existing tests pass
2. New feature works as expected
3. No obvious errors or security issues
Do NOT mark as passed after only a few tests.
Run command: pytest --verbose
"""}]
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=2048,
            messages=messages,
            tools=[run_tests_tool]
        )
        return {"passed": extract_pass_fail(response), "issues": extract_issues(response)}

def implement_with_verification(requirements: str, max_attempts=3):
    for attempt in range(max_attempts):
        result = CodingAgent().implement_feature(requirements)
        verification = VerificationAgent().verify_implementation(requirements, result['files_changed'])
        if verification['passed']:
            return result
        requirements += f"

Previous attempt failed: {verification['issues']}"
    raise Exception(f"Verification failed after {max_attempts} attempts")

5.1 The “Early Victory” Problem

The most common failure is marking output as successful after only a few superficial tests. Mitigation strategies include:

Specific standards : Require “run the full test suite and report all failures”.

Comprehensive checks : Test multiple scenarios and edge cases.

Negative testing : Force the verifier to feed failing inputs and confirm they indeed fail.

6. Summary and Outlook

Multi‑agent systems are powerful but not universally applicable. Before adding coordination complexity, ensure there are genuine constraints such as context limits, parallelization opportunities, or specialization needs. Adopt a context‑centric decomposition, define clear verification points, and start with the simplest effective approach, only increasing complexity when evidence supports it.

multi-agent systemsParallelizationLLM Orchestrationcontext isolationAgent SpecializationVerification Subagent
AI Tech Publishing
Written by

AI Tech Publishing

In the fast-evolving AI era, we thoroughly explain stable technical foundations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.