How Claude Code’s Agent Swarms Use Unix Domain Sockets to Run 10 AIs Concurrently
This article deep‑dives into Claude Code’s Agent Swarms, explaining why Unix Domain Sockets replace HTTP for intra‑process communication, how three‑stage address parsing, filesystem‑based mailbox queues, various spawn modes, AgentId design, graceful shutdown, plan‑mode approval and common pitfalls together enable reliable, low‑latency coordination of multiple LLM agents.
Why Not HTTP?
Claude Code runs a single‑machine swarm where each Teammate is a separate Node.js process. HTTP incurs TCP handshakes and millisecond‑level latency, and each process must be assigned a unique port, creating scheduling complexity. Unix Domain Sockets (UDS) communicate via kernel buffers with microsecond‑level latency and use a unique file‑path address, eliminating port conflicts. For same‑machine, low‑latency scenarios, UDS is the optimal choice.
Three‑Stage Address Parsing
The framework parses a target string to select the transport layer. The function parseMessageAddress implements three prefix rules:
function parseMessageAddress(target: string) {
if (target.startsWith("uds:"))
return { scheme: "uds", target: target.slice(4) };
if (target.startsWith("bridge:"))
return { scheme: "bridge", target: target.slice(7) };
if (target.startsWith("/"))
return { scheme: "uds", target };
return { scheme: "other", target };
} uds:– direct point‑to‑point UDS socket. bridge: – forwards through an IDE bridge (VS Code/JetBrains) for GUI visibility.
Plain path – falls back to a filesystem‑based mailbox.
Mailbox: File‑system Asynchronous Queue
Most inter‑Agent communication uses a mailbox stored as JSONL files under
~/.claude/teams/<teamName>/mailbox/<agentName>.jsonl. Each line is a JSON message.
async function attachTeammateMailbox(context) {
const messages = await readMailbox(agentName, teamName);
if (messages.length === 0) return [];
return [{
type: "user",
message: { content: formatMessages(messages) },
isMeta: true
}];
}Messages are written by the sender and later read by the receiver during its next LLM polling cycle. The attachment injects the formatted messages into the LLM tool‑use context, avoiding a dedicated receiver thread and ensuring messages are consumed in the Agent’s reasoning flow.
Spawn Modes
Claude Code can launch Teammate agents in three ways. The routing logic is:
async function handleSpawn(params, context) {
// headless / CI mode – in‑process runner
if (isHeadless()) return spawnInProcess(params, context);
try { await getPaneBackend(); }
catch (err) {
// auto fallback to in‑process when pane backend unavailable
if (getTeammateMode() === "auto") return spawnInProcess(params, context);
throw err;
}
// split‑pane preferred for macOS iTerm2
if (params.use_splitpane !== false) return spawnWithSplitPane(params, context);
// default to tmux window
return spawnWithTmux(params, context);
}tmux window – each Teammate runs in an independent tmux pane; visible terminal, good for debugging and long‑running tasks; startup cost ~500 ms; provides crash isolation.
iTerm2 split pane – macOS visual split; also independent processes; startup cost ~800 ms.
in‑process – async runner inside the Lead process; startup cost <10 ms; no process isolation, so a crash can affect the Lead.
AgentId Design
Agents are identified by the string agentName@teamName (e.g., researcher@my-team). The function makeAgentId creates this identifier, and deduplicateAgentName appends -2, -3, etc., when a name collision occurs within the same team.
Human‑readable: logs show researcher@my-team instead of opaque UUIDs.
Globally unique via team scope.
Derives the socket file path directly, eliminating separate service‑discovery.
Team Lifecycle and Graceful Shutdown
The swarm workflow is a state machine:
TeamCreate → spawn Teammates → distribute tasks → SendMessage × N → await completion → TeamDeleteShutdown uses a two‑way handshake:
Lead → shutdown_request → Teammate
Teammate → shutdown_response(approve) → Lead
Lead confirms → TeamDeleteTool → clean mailbox filesMessage types shutdown_request, shutdown_response, and plan_approval_response are structured to support orderly termination.
Plan Mode Approval
When a Teammate runs in plan mode, it must obtain Lead approval before executing its plan.
// Teammate side
if (isTeammate() && planModeRequired()) {
const request = {
type: "plan_approval_request",
from: agentName,
planContent,
requestId: generateRequestId("plan_approval", agentId)
};
await writeToMailbox("team-lead", { text: JSON.stringify(request) }, teamName);
return { awaitingLeaderApproval: true, requestId: request.requestId };
}
// Lead side (SendMessage tool)
await SendMessage({
to: "researcher",
message: {
type: "plan_approval_response",
request_id: "plan_approval_xxx",
approve: true,
permissionMode: "default"
}
});This implements a distributed workflow gate‑control useful for high‑risk AI operations.
Common Pitfalls and Edge Cases
Teammate start‑up race – messages may arrive before a Teammate is ready; mailbox buffering ensures they are consumed on the next poll.
In‑process crash propagation – AbortControllers protect the runner but do not catch unhandled rejections; tmux mode is recommended for production.
Stale mailbox files – after TeamDelete, old mailbox files must be removed; otherwise a newly spawned Teammate may read stale messages.
Broadcast linear cost – SendMessage(to: "*") writes one file per teammate; avoid in high‑frequency scenarios such as heartbeats.
Portable Engineering Patterns
Replace external message queues with a filesystem mailbox for same‑machine agents; read‑and‑truncate semantics prevent duplicate consumption.
Use protocol‑prefix address routing ( uds://, bridge://, http://) to make the transport layer interchangeable without changing call sites.
Adopt name@scope AgentIds for readable, unique identifiers that double as routing keys.
Design Insights Summary
UDS provides an order‑of‑magnitude latency advantage over HTTP for intra‑machine IPC.
Mailbox decouples sender and receiver, fitting LLM‑based agents where the receiver may be blocked.
Three spawn modes address debugging (tmux), CI/CD (in‑process), and macOS development (iTerm2 split pane). name@scope AgentIds improve log readability and enable built‑in routing.
Two‑way shutdown handshake prevents premature termination.
Plan‑mode approval offers a reusable gate‑control pattern for safe AI execution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
James' Growth Diary
I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
