Artificial Intelligence 22 min read

Taming Context Explosion: Multi‑Agent Compression Engineering in Claude Code

The article dissects Claude Code’s three‑layer compression system—microCompact, autoCompact, and sessionMemoryCompact—explaining how each layer mitigates the multiplicative token growth of multi‑agent workflows, the compact_boundary bookmark for resume support, cache‑friendly designs, and practical pitfalls.

James' Growth Diary

Jun 15, 2026

Taming Context Explosion: Multi‑Agent Compression Engineering in Claude Code

Background: Context multiplication in multi‑agent workflows

When an AI instructs another AI, context consumption changes from additive to multiplicative. For example, five concurrent child agents each returning ~10 K tokens add ~50 K tokens to the parent agent’s context per round, quickly exceeding a 200 K token limit after dozens of tool‑call rounds.

Industry approaches to context overflow

LangChain : ConversationBufferWindowMemory discards messages beyond a fixed window – simple but can lose critical change logs.

LangGraph : adds a summarize node that triggers an LLM summary when a token threshold is crossed; operates at a single granularity – either no compression or a full‑history summary.

OpenAI Assistants API : performs server‑side thread management and automatic truncation, but the process is a black box, making debugging difficult.

Claude Code : avoids a single‑trigger approach and implements three distinct compression layers that cooperate.

Design conclusion: three‑layer compression + compact_boundary bookmark

The three layers each have a dedicated role, working together to keep context size under control while preserving essential information.

Layer 1: Micro‑compression (microCompact) – Reduce while running

The core logic lives in src/services/compact/microCompact.ts. Only tool results listed in the COMPACTABLE_TOOLS whitelist are eligible for compression because they are large, data‑heavy blocks whose relevance drops after a few rounds. The whitelist includes file reads, shell commands, grep, glob, web search/fetch, and file edit/write tools. Instruction‑type tools such as TodoWrite and MemoryWrite are excluded.

Images receive a fixed token budget ( IMAGE_MAX_TOKEN_SIZE = 2000) and are not truncated; during autoCompact they are replaced by the placeholder [image].

Time‑based micro‑compression clears stale tool results when the user has been idle long enough for Anthropic’s prompt cache to expire (default 5 minutes), reducing unnecessary token writes.

const COMPACTABLE_TOOLS = new Set<string>([
  FILE_READ_TOOL_NAME, // Read
  ...SHELL_TOOL_NAMES, // Bash/Shell
  GREP_TOOL_NAME,      // Grep
  GLOB_TOOL_NAME,      // Glob
  WEB_SEARCH_TOOL_NAME,// WebSearch
  WEB_FETCH_TOOL_NAME, // WebFetch
  FILE_EDIT_TOOL_NAME, // Edit
  FILE_WRITE_TOOL_NAME // Write
]);

Layer 2: Auto‑compression (autoCompact) – Threshold‑triggered LLM summary

When micro‑compression can no longer keep the context under the token window, autoCompact triggers. The threshold is calculated as

contextWindow – MAX_OUTPUT_TOKENS_FOR_SUMMARY – AUTOCOMPACT_BUFFER_TOKENS

. For Claude 3.7 Sonnet (200 K context) the threshold is ≈ 167 K tokens.

const MAX_OUTPUT_TOKENS_FOR_SUMMARY = 20_000;
export const AUTOCOMPACT_BUFFER_TOKENS = 13_000;
export function getAutoCompactThreshold(model: string): number {
  const effectiveContextWindow = getEffectiveContextWindowSize(model);
  return effectiveContextWindow - AUTOCOMPACT_BUFFER_TOKENS;
}

During autoCompact the system strips images, builds a summary prompt, and calls the LLM via a forked agent that shares the main thread’s prompt‑cache prefix, preserving cache hits and saving up to 98 % of cache‑creation token cost.

Layer 3: Session memory compression (sessionMemoryCompact) – Write summary to long‑term memory

This layer requires no additional LLM calls. A background extractor continuously writes important information to MEMORY.md. When autoCompact is considered, the system first attempts sessionMemoryCompact; if successful it returns the compacted result without invoking the LLM.

const DEFAULT_SM_COMPACT_CONFIG = {
  minTokens: 10_000,          // keep at least 10 K tokens
  minTextBlockMessages: 5,    // keep at least 5 text messages
  maxTokens: 40_000           // never exceed 40 K tokens
};

The algorithm finds the last summarized message, then expands backwards until both minTokens and minTextBlockMessages are satisfied, without exceeding maxTokens. It also adjusts the start index to keep tool_use / tool_result pairs intact.

Engineering details: circuit breaker, cache friendliness, and resume

Circuit breaker : autoCompact failures are counted; after three consecutive failures the system stops retrying to avoid endless loops that waste API calls.

const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3;
if (tracking?.consecutiveFailures >= MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES) {
  return { wasCompacted: false };
}
try {
  const result = await compactConversation(...);
  return { wasCompacted: true, consecutiveFailures: 0 };
} catch (e) {
  const next = (tracking?.consecutiveFailures ?? 0) + 1;
  return { wasCompacted: false, consecutiveFailures: next };
}

Cache‑friendly design : autoCompact runs in a forked agent with the same system prompt and tool list, so the prefix cache remains hit even after the history is replaced by a summary.

const result = await runForkedAgent({
  promptMessages: [summaryRequest],
  cacheSafeParams,
  canUseTool: createCompactCanUseTool(),
  querySource: 'compact',
  maxTurns: 1,
  skipCacheWrite: true,
});

compact_boundary bookmark records the UUID of the last message before compression. When a user resumes a session (using --resume), the system loads messages starting after the latest bookmark, restoring the compressed context without re‑executing prior work.

// src/utils/messages.ts (simplified)
export function createCompactBoundaryMessage(
  trigger: 'auto' | 'manual',
  preCompactTokenCount: number,
  lastMessageUuid?: UUID,
): SystemCompactBoundaryMessage {
  return {
    type: 'system',
    subtype: 'compact_boundary',
    compactMetadata: { trigger, preCompactTokenCount, lastPreCompactMessageUuid: lastMessageUuid },
  };
}

Common pitfalls

Pitfall 1: Child agents must not trigger autoCompact; the code guards against recursion by checking querySource (skip when it is 'session_memory' or 'compact').

Pitfall 2: After compression the file cache ( context.readFileState) is cleared. Only the most recent five files (≤ 5 K tokens each) are re‑injected; older files must be read again manually.

Pitfall 3: Compression must preserve tool_use / tool_result pairs; the helper adjustIndexToPreserveAPIInvariants shifts the start index forward when needed.

Pitfall 4: Environment variables can override thresholds for testing (e.g., CLAUDE_CODE_AUTO_COMPACT_WINDOW, CLAUDE_AUTOCOMPACT_PCT_OVERRIDE, DISABLE_AUTO_COMPACT).

Summary of the three layers

Micro‑compression (microCompact) – runs every round, very low cost (in‑memory), truncates large tool results and delays autoCompact.

Auto‑compression (autoCompact) – threshold‑based, medium cost (single LLM call), replaces history with a summary.

Session‑memory compression (sessionMemoryCompact) – threshold‑based (preferred over autoCompact), low cost (reuses existing summary), no extra LLM cost.

The three‑layer approach, combined with the compact_boundary bookmark for resume support, enables Claude Code to manage context efficiently in long‑running, multi‑agent conversations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM multi-agent Claude Code context compression autoCompact sessionMemoryCompact

Written by

James' Growth Diary

I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.