Artificial Intelligence 17 min read

Inside Claude Code: Three‑Tier Compression Enabling Unlimited‑Length AI Tasks

The article dissects Claude Code's three‑level progressive compression system—MicroCompact, SessionMemoryCompact, and Full Compact—showing how it edits cached prompts, maintains background memory files, and generates a structured nine‑section summary to keep AI agents operating over arbitrarily long conversations within a limited context window.

Shuge Unlimited

Apr 4, 2026

Inside Claude Code: Three‑Tier Compression Enabling Unlimited‑Length AI Tasks

Why Long Dialogues Matter for AI Agents

Even the strongest language models have a finite context window, so extended conversations quickly exceed the token limit. Traditional solutions either truncate old messages or use a sliding window, both of which lose crucial causal links between messages.

Three‑Tier Progressive Compression Strategy

Claude Code introduces a three‑level approach that upgrades from low‑cost to higher‑cost methods only when needed, preserving Prompt Cache hits whenever possible.

Level 1 – MicroCompact (Silent Compression)

MicroCompact has two sub‑strategies:

Cached MicroCompact

Uses Anthropic's cache_edits API to delete outdated tool results directly in the server‑side prompt cache without resending the whole prompt, resulting in zero Prompt Cache misses and no extra token consumption.

// Register tool ID → queue cache_edits block → API layer applies automatically
// Source: src/services/compact/apiMicrocompact.ts (line 153)
// Core idea: tell the server "delete the X‑th tool result from the cache"
// instead of sending a new prompt without that result

Time‑Based MicroCompact

Triggers when the time since the last assistant message exceeds 60 minutes (the approximate TTL of Anthropic's Prompt Cache). It replaces old tool results with a placeholder [Old tool result content cleared]. Only a predefined list of compressible tools (e.g., Read, Bash, Grep / Glob, WebSearch / WebFetch, Edit / Write) is cleared, preserving user intent and reasoning chains.

// Source: src/services/compact/timeBasedMCConfig.ts (line 43)
// Config: 60‑minute threshold, keep the most recent N compressible tool results

Level 2 – SessionMemoryCompact (Stealing Time with Memory Files)

When conversations become very long, MicroCompact alone is insufficient. SessionMemoryCompact continuously extracts key information after each turn into a session‑memory file ( sessionMemory.ts and extractMemories.ts) so that, when compression is needed, the system can replace trimmed messages with this pre‑generated summary without an extra API call.

// Core logic (src/services/compact/sessionMemoryCompact.ts, line 630)
// 1. Compute retained window (≥10K tokens, 5 text messages)
// 2. Replace trimmed old messages with the background session memory file
// 3. Keep tool_use/tool_result pairs aligned
// 4. Preserve the thinking block integrity

The design insists that the cut point must keep tool_use and tool_result paired, and the entire thinking block must remain intact to avoid breaking the model's reasoning chain.

Level 3 – Full Compact (The "Nuclear" Nine‑Section Summary)

If the first two levels cannot fit the context, Full Compact calls the API to generate a structured nine‑section summary (defined in prompt.ts line 374). The sections answer questions such as primary request, key technical concepts, files and code sections, errors and fixes, problem‑solving steps, all user messages, pending tasks, current work, and the next step with required citations.

// Source: src/services/compact/prompt.ts
// NO_TOOLS_PREAMBLE added to force text‑only response:
// "CRITICAL: Respond with TEXT ONLY. Do NOT use any tools."
// This prevents tool‑call failures that would otherwise abort the summary.

Full Compact also supports two compression directions: 'from' (keep the early part of the conversation, compress the later part, preserving cache prefix) and 'up_to' (compress the early part, keep the latest context). The system marks the boundary with SYSTEM_PROMPT_DYNAMIC_BOUNDARY to keep the static prompt prefix untouched.

PTL Retry – Handling "Too Long" with Recursive Truncation

When the old messages themselves exceed the context window, Full Compact groups messages by API round ( groupMessagesByApiRound()) and iteratively truncates the oldest group, retrying up to three times until the payload fits.

// Source: src/services/compact/grouping.ts (line 63)
// Group by round → truncate oldest group → retry compression (max 3 attempts)

Automatic Triggering and Cost Awareness

The autoCompact.ts module monitors token usage and automatically selects the cheapest viable compression level in the order: MicroCompact > SessionMemoryCompact > Full Compact. It also emits a warning (via /compact) before compression, allowing users to intervene.

Memory System – Long‑Term Companion

Claude Code stores a persistent MEMORY.md index (managed by memdir.ts) limited to 200 lines or 25 KB. It records four categories: user, feedback, project, and reference, while deliberately excluding code, architecture, and Git history to avoid bloat.

The autoDream.ts process periodically consolidates and deduplicates memory entries when three conditions are met: more than 24 hours since the last run, more than five sessions accumulated, and a file lock preventing concurrent merges.

Complementary Loop of Compact and Memory

Compression discards fine‑grained details but keeps the immediate context; the memory system preserves high‑level facts for future sessions. Together they form an information lifecycle: high‑frequency data stays in the context, low‑frequency data moves to memory, and irrelevant data is dropped.

Conclusion

The eight examined source files reveal a design philosophy of progressive degradation, cache‑friendliness, structured summarisation, and information tiering. For developers building AI agents, the key takeaway is to proactively trigger compression, align it with cache behaviour, and complement it with a robust memory subsystem.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Agent Claude Code Context Compression prompt cache Session Memory Auto Compression Structured Summary

Written by

Shuge Unlimited

Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Why Long Dialogues Matter for AI Agents

Three‑Tier Progressive Compression Strategy

Level 1 – MicroCompact (Silent Compression)

Cached MicroCompact

Time‑Based MicroCompact

Level 2 – SessionMemoryCompact (Stealing Time with Memory Files)

Level 3 – Full Compact (The "Nuclear" Nine‑Section Summary)

PTL Retry – Handling "Too Long" with Recursive Truncation

Automatic Triggering and Cost Awareness

Memory System – Long‑Term Companion

Complementary Loop of Compact and Memory

Conclusion

Shuge Unlimited

How this landed with the community

Was this worth your time?

0 Comments

Level 1 – MicroCompact (Silent Compression)

Level 2 – SessionMemoryCompact (Stealing Time with Memory Files)

Level 3 – Full Compact (The "Nuclear" Nine‑Section Summary)