Artificial Intelligence 18 min read

Sub-Agent vs Agent Team: Designing Multi-Agent Architectures Around Context Boundaries

The article explains how to choose between Sub‑Agent and Agent Team structures for multi‑agent systems by evaluating whether sub‑tasks share context, need isolation, compression, parallelism, or continuous collaboration, and provides practical guidelines, pitfalls, and a decision framework to avoid over‑engineering.

Architect

Apr 27, 2026

TL;DR

In multi‑agent architectures, the first question is not "how many agents to split into" but whether the sub‑tasks share the same context. Use Sub‑Agent for isolated, compressible, parallel tasks; use Agent Team when state must be shared.

Many teams start with the assumption that a complex task automatically requires multiple agents, which leads to designs that split by human‑like roles (planner, developer, tester, reviewer). When such designs are run with LLM‑based agents, information thins at each hand‑off because agents lack shared memory and cannot infer missing context.

Why Role‑Based Splits Fail

Each hand‑off loses context: the planner knows why a code change was made, but the developer does not receive that reasoning; the developer’s temporary decisions are not recorded; the tester receives only a cleaned‑up code snapshot without the rationale; the reviewer then lacks confidence. The root cause is an organizational mindset that mirrors human team structures, which does not map to LLM agents that operate solely on the context they receive.

Design around context boundaries, not roles.

Sub‑Agent Pattern

A Sub‑Agent is a child agent that receives a well‑defined task description, runs in its own isolated context, and returns only the final conclusion—not the intermediate reasoning. This provides three benefits:

Isolation : The sub‑task runs without contaminating the parent’s context, preserving the limited context window of large models.

Compression : Only the distilled result is returned, turning noisy exploration into a clean signal.

Parallelism : Because Sub‑Agents cannot communicate directly, they can be executed concurrently, e.g., separate agents for security review, performance analysis, and test‑coverage checks.

Key hard constraints of Sub‑Agents:

Sub‑Agents cannot communicate with each other.

Sub‑Agents cannot spawn new agents.

All traffic must pass through the parent agent.

Only the final output is returned; intermediate thoughts are omitted.

These constraints ensure controllability while still allowing scalable parallel execution.

from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition

async def main():
    async for message in query(
        prompt="Review the authentication module for issues",
        options=ClaudeAgentOptions(
            allowed_tools=["Read", "Grep", "Glob", "Agent"],
            agents={
                "security-reviewer": AgentDefinition(
                    description="Find vulnerabilities and security risks",
                    prompt="You are a security expert.",
                    tools=["Read", "Grep", "Glob"],
                    model="sonnet",
                ),
                "performance-optimizer": AgentDefinition(
                    description="Identify performance bottlenecks",
                    prompt="You are a performance engineer.",
                    tools=["Read", "Grep", "Glob"],
                    model="sonnet",
                ),
            },
        ),
    ):
        print(message)

In this example, the description field acts as a routing signal that tells the parent which Sub‑Agent should handle the request. A clear description leads to precise routing.

Agent Team Pattern

An Agent Team resembles a small, continuously collaborating group. It has a shared context, direct inter‑agent dialogue, and a state layer that tracks progress, dependencies, and blockers. This pattern fits tasks where the output of one step immediately influences the next, such as:

Frontend changes that must instantly inform backend services.

Test failures that require immediate developer attention.

Product requirement updates that need the whole pipeline to roll back.

Because information is shared, the team can react to intermediate results without waiting for a parent orchestrator to relay messages, but this comes with higher costs:

Requires a shared state layer that handles conflict resolution, visibility, and versioning.

Needs a communication protocol between agents.

Demands a Lead Agent to arbitrate disputes and drive progress.

Debugging becomes more complex as failures may arise from inter‑agent coordination rather than a single node.

Task does not depend on each other → don’t use a team; task depends on each other → don’t use Sub‑Agent.

Common Orchestration Primitives

Production‑grade multi‑agent systems rarely invent new primitives; they reuse a handful of well‑known patterns:

Prompt Chaining : Sequentially pass the output of A to B to C (e.g., extract → translate → polish).

Routing : Dispatch a task to the most suitable agent based on its characteristics (common in customer‑service bots).

Parallelization : Run independent tasks concurrently and aggregate results (useful for multi‑dimensional code reviews).

Orchestrator‑Worker : An orchestrator splits work, assigns it to workers, and collects results; this is essentially the Sub‑Agent form.

Evaluator‑Optimizer : Generate, evaluate, then iterate—ideal for high‑quality outputs like reports or code completions.

These primitives are not new; they are simply repurposed for agent orchestration.

Practical Decision Framework

Can a single agent complete the task? If yes, start with a single agent.

Do sub‑tasks need to see each other’s intermediate process? If no → Sub‑Agent; if yes → Agent Team.

Do sub‑tasks affect each other’s next steps? If no → parallel Sub‑Agent; if yes → Team.

Is the motivation merely to look “more advanced”? If yes → revert to a single agent and clarify the task model.

Must the workflow follow strict business rules without model freedom? If yes → add a deterministic middle layer instead of a Team.

The core principle is: first clarify the task structure, then decide the agent structure. Avoid the opposite order.

When Not to Use Multiple Agents

If a single agent can handle the job without noticeable degradation, adding more agents introduces hidden costs: orchestration code, contract versioning, longer debugging paths, context‑synchronization bugs, and doubled governance overhead. When tasks are highly dependent and coordination cost outweighs benefits, a single agent is the most stable choice.

Design around context boundaries, not roles; start simple and add complexity only when truly needed.

In practice, treat Sub‑Agent and Agent Team as tools rather than competing product categories. Choose the one that matches the isolation, compression, parallelism, or continuous collaboration needs of your specific task, and combine them with the basic primitives listed above to cover most production scenarios.

Multi-agent AI Architecture Sub‑Agent Prompt Chaining Agent Team Context Boundaries

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.