Mastering the Coordinator Pattern: Control‑Plane/Data‑Plane Separation for Scalable Multi‑Agent Orchestration
The article dissects Claude Code’s Coordinator pattern, explaining how separating the control plane from the data plane eliminates serial bottlenecks, context overflow, and fault‑propagation in single‑Agent setups, and details the dual back‑end design, message protocol, engineering insights, technical debt, and practical adoption guidelines.
Why a single‑Agent architecture breaks down
When a naïve "master Agent" calls each Tool sequentially, three problems arise: (1) serial waiting multiplies latency, (2) task logs and tool calls fill the main Agent’s context window causing token‑limit errors, and (3) a crash in any sub‑Agent leaves the whole session hanging.
Core idea of the Coordinator pattern
The pattern separates responsibilities: the Coordinator (control plane) only schedules, decides, and aggregates results, while Teammates (data plane) only execute assigned work. Communication follows a strict message protocol, mirroring classic control‑plane/data‑plane separation.
Source layout
The main implementation lives in src/coordinator/coordinatorMode.ts (~369 lines). Supporting files include src/utils/udsMessaging.ts (Unix Domain Socket bus), src/tasks/inProcessRunner.ts (in‑process executor), and the src/TeammateTool/ utilities.
Two interchangeable back‑ends
Coordinator can run Teammates either in‑process (lightweight, <10 ms startup, suitable for CI/CD) or in separate pane processes (tmux/iTerm2, visible UI, higher startup cost). The trade‑offs are:
In‑process : runs in the same memory space, minimal overhead, but shares memory and offers weak fault isolation.
Pane : each Agent runs in its own terminal pane, giving strong isolation and visual debugging, but incurs fork and pane creation overhead.
In‑process runner details
The runner’s heart is a poll loop that repeatedly:
Executes a turn by invoking the LLM and tools, updating UI state.
Notifies the Coordinator that the turn is idle.
Polls the mailbox (500 ms interval) for new messages, shutdown requests, or task‑list prompts, handling each case accordingly.
Key design points:
500 ms is a “magic number” balancing latency (<1 s) and CPU cost; an event‑driven UDS push would be more efficient.
Shutdown requests are wrapped as LLM messages, allowing the Agent to perform graceful cleanup instead of a raw SIGTERM.
When no Coordinator message arrives, the Agent pulls the next task from a shared task list, supporting both push and pull assignment models.
Message protocol
Coordinator → Teammate commands are XML‑wrapped (easier for LLMs to parse structured instructions), while Teammate → Coordinator responses use JSON (fast machine parsing). This dual‑encoding leverages the strengths of each format.
// Example of Coordinator → Teammate XML command (escaped for HTML)
function wrapTeamLeadMessage(fromId, text, color?, summary?) {
const colorAttr = color ? ` color="${color}"` : '';
const summaryAttr = summary ? ` summary="${summary}"` : '';
return `<team_lead teammate_id="${fromId}"${colorAttr}${summaryAttr}>${text}</team_lead>`;
}
// Example of Teammate → Coordinator JSON shutdown request
interface ShutdownRequest {
type: 'shutdown_request';
requestId: string;
from: string; // sender name
reason: string;
}Pane back‑end implementation
Each Pane‑based Teammate spawns a new tmux/iTerm2 pane, injects environment variables such as CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 to signal Team mode, and passes identity flags ( --agent-id, --team-name, etc.) to the child process. After pane creation, the initial prompt is sent via the UDS mailbox.
async spawn(config) {
const agentId = buildAgentId(config.name, config.teamName);
const { paneId, isFirstTeammate } = await this.backend.createTeammatePaneInSwarmView(config.name, assignColor(agentId));
const agentFlags = [
`--agent-id ${quote([agentId])}`,
`--agent-name ${quote([config.name])}`,
`--team-name ${quote([config.teamName])}`,
`--agent-color ${quote([color])}`,
`--parent-session-id ${quote([parentSessionId])}`
].join(' ');
const envVars = `CLAUDECODE=1 CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`;
const command = `cd ${quote([config.cwd])} && env ${envVars} ${claudeExec} ${agentFlags} ${permissionFlags}`;
await this.backend.sendCommandToPane(paneId, command);
await sendToMailbox(config.name, { from: 'team-lead', text: config.prompt, timestamp: new Date().toISOString() }, config.teamName);
return { success: true, agentId, paneId };
}Design insights (transferable engineering lessons)
Separating control and data planes is essential for scalable multi‑Agent systems.
Abstracting the execution backend behind a TeammateExecutor interface reduces testing cost to near zero.
Supporting both push‑based and pull‑based task assignment increases flexibility and load‑balancing options.
Graceful shutdown via LLM‑driven messages yields cleaner termination at the cost of an extra LLM call.
Critical perspective: technical debt
Fixed 500 ms polling creates unnecessary wake‑ups; an event‑driven design would eliminate the “timer bomb”.
Message formats lack versioning; schema changes could silently break parsing.
Team configuration is stored in a plain JSON file without atomic writes, risking corruption on crashes.
In‑process mode shares memory, so a malicious or buggy Agent could affect its peers; pane mode provides process isolation.
Practical adoption guide
When re‑using the pattern, define two minimal interfaces:
interface AgentCoordinator {
decomposeTask(task: string): TaskGraph;
assignTask(agentId: string, task: Task): void;
aggregateResult(results: AgentResult[]): string;
}
interface WorkerAgent {
execute(task: Task): AsyncGenerator<Message>;
reportResult(result: AgentResult): void;
}Key decisions:
Choose pane for local debugging with visual feedback.
Choose in‑process for CI/CD pipelines where UI is unnecessary.
For production, prefer independent processes with UDS or WebSocket/gRPC communication.
Additional pitfalls to avoid:
Never let the Coordinator keep direct references to Worker memory; rely solely on the message protocol.
Implement per‑Task deadlines and check them in the poll loop.
Persist team state atomically so a crash can recover the configuration.
Conclusion
The Coordinator pattern’s strength lies in cleanly separating the control plane (task decomposition, assignment, aggregation) from the data plane (execution), using a dual XML/JSON protocol, and offering interchangeable back‑ends. While it dramatically improves concurrency, context management, and fault isolation, developers should address the identified technical debt—especially the polling mechanism, message versioning, and state persistence—to ensure robust production deployments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
James' Growth Diary
I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
