Artificial Intelligence 19 min read

How AI Agents Manage Context: Compression Strategies from Manus, Claude Code, and Gemini CLI

This article examines the context explosion problem in AI agents and compares three distinct compression approaches—Manus's never‑lose philosophy, Claude Code's aggressive 92% threshold with eight‑section summaries, and Gemini CLI's balanced 70% trigger with curated history—highlighting their trade‑offs in performance, cost, and reliability.

Architecture and Beyond

Sep 6, 2025

How AI Agents Manage Context: Compression Strategies from Manus, Claude Code, and Gemini CLI

1. Manus

Manus treats the file system as the "ultimate context", never deleting information but externalizing it. When an agent accesses a webpage, only the URL and a brief description are kept in the prompt; the full HTML is re‑fetched on demand, similar to noting a reference instead of redrawing a chart.

Documents are handled the same way: a PDF’s path, page number, and last‑accessed position are stored, and the content is loaded only when needed.

Key implementation points:

Retain minimal necessary information : store only metadata that allows reconstruction (e.g., URLs, file paths).

Smart reload timing : reload resources only when the context requires detailed content.

Cache mechanism : keep recently accessed resources in a local cache that does not consume token quota.

2. Claude Code

2.1 92% Threshold

Compression triggers when token usage reaches 92% of the window, leaving an 8% buffer for safety and possible fallback strategies.

const COMPRESSION_CONFIG = {
  threshold: 0.92, // 92% trigger
  triggerVariable: "h11", // h11 = 0.92
  compressionModel: "J7()", // dedicated compression model
  preserveStructure: true // keep 8‑section structure
};

2.2 Eight‑Section Structured Summary

The summary is divided into eight clearly prioritized sections, ensuring that essential intents, technical concepts, files, errors, problem‑solving steps, all user messages, pending tasks, and current work are preserved.

const COMPRESSION_SECTIONS = [
  "1. Primary Request and Intent",
  "2. Key Technical Concepts",
  "3. Files and Code Sections",
  "4. Errors and fixes",
  "5. Problem Solving",
  "6. All user messages",
  "7. Pending Tasks",
  "8. Current Work"
];

2.3 Dedicated Compression Model J7

J7 is a specialized model that generates high‑quality structured abstracts from long dialogues.

async function contextCompression(currentContext) {
  // Check compression condition
  if (currentContext.tokenRatio < h11) {
    return currentContext; // No compression needed
  }
  const compressionPrompt = await AU2.generatePrompt(currentContext);
  const compressedSummary = await J7(compressionPrompt);
  const newContext = {
    summary: compressedSummary,
    recentMessages: currentContext.recent(5), // keep last 5 messages
    currentTask: currentContext.activeTask
  };
  return newContext;
}

2.4 Context Lifecycle Management

Context is treated as a dynamic entity that evolves; each new input triggers token‑usage checks and, if necessary, an automated compression step.

class ContextManager {
  constructor() {
    this.compressionThreshold = 0.92; // h11 = 0.92
    this.compressionModel = "J7"; // dedicated model
  }
  async manageContext(currentContext, newInput) {
    const updatedContext = this.appendToContext(currentContext, newInput);
    const tokenUsage = await this.calculateTokenUsage(updatedContext);
    if (tokenUsage.ratio >= this.compressionThreshold) {
      const compressionPrompt = await AU2.generateCompressionPrompt(updatedContext);
      const compressedSummary = await this.compressionModel.generate(compressionPrompt);
      return this.buildCompressedContext(compressedSummary, updatedContext);
    }
    return updatedContext;
  }
}

2.5 Graceful Degradation

If compression fails, the system falls back through Plan B and Plan C, ensuring stability even under adverse conditions.

Adaptive re‑compression : retry with adjusted parameters when quality is insufficient.

Hybrid mode retention : aggressively compress old content while fully preserving recent interactions.

Conservative truncation : guarantee minimal functionality in worst‑case scenarios.

2.6 Information Recovery

Although lossy, the eight‑section summary retains enough structured context for the agent to continue reasoning, especially the "All user messages" section which safeguards user intent.

3. Gemini CLI

3.1 70% Trigger, 30% Retention

Compression starts at 70% token usage, preserving the latest 30% of the conversation to avoid abrupt context loss.

3.2 Curated History Extraction

Only valuable history is kept; user messages are always retained, while model messages are kept only if the entire block passes validity checks.

function extractCuratedHistory(comprehensiveHistory) {
  if (!comprehensiveHistory || comprehensiveHistory.length === 0) {
    return [];
  }
  const curatedHistory = [];
  let i = 0;
  while (i < comprehensiveHistory.length) {
    if (comprehensiveHistory[i].role === 'user') {
      curatedHistory.push(comprehensiveHistory[i]);
      i++;
    } else {
      const modelOutput = [];
      let isValid = true;
      while (i < comprehensiveHistory.length && comprehensiveHistory[i].role === 'model') {
        modelOutput.push(comprehensiveHistory[i]);
        if (isValid && !isValidContent(comprehensiveHistory[i])) {
          isValid = false;
        }
        i++;
      }
      if (isValid) {
        curatedHistory.push(...modelOutput);
      }
    }
  }
  return curatedHistory;
}

3.3 Content Validity Check

function isValidContent(content) {
  if (!content.parts || content.parts.length === 0) return false;
  for (const part of content.parts) {
    if (!part || Object.keys(part).length === 0) return false;
    if (!part.thought && part.text !== undefined && part.text === '') return false;
  }
  return true;
}

3.4 Five‑Section Structured Summary

1. overall_goal - user's main goal
2. key_knowledge - important technical decisions
3. file_system_state - current file system status
4. recent_actions - recent important operations
5. current_plan - current execution plan

3.5 Token‑Based Smart Compression

async function tryCompressChat(prompt_id, force = false) {
  const curatedHistory = this.getChat().getHistory(true);
  if (curatedHistory.length === 0) return null;
  const model = this.config.getModel();
  const { totalTokens: originalTokenCount } = await this.getContentGenerator().countTokens({
    model,
    contents: curatedHistory,
  });
  const contextPercentageThreshold = this.config.getChatCompression()?.contextPercentageThreshold;
  if (!force) {
    const threshold = contextPercentageThreshold ?? COMPRESSION_TOKEN_THRESHOLD; // default 0.7
    if (originalTokenCount < threshold * tokenLimit(model)) return null;
  }
  let compressBeforeIndex = findIndexAfterFraction(
    curatedHistory,
    1 - COMPRESSION_PRESERVE_THRESHOLD // 0.3
  );
  while (
    compressBeforeIndex < curatedHistory.length &&
    (curatedHistory[compressBeforeIndex]?.role === 'model' || isFunctionResponse(curatedHistory[compressBeforeIndex]))
  ) {
    compressBeforeIndex++;
  }
  const historyToCompress = curatedHistory.slice(0, compressBeforeIndex);
  const historyToKeep = curatedHistory.slice(compressBeforeIndex);
  this.getChat().setHistory(historyToCompress);
  const { text: summary } = await this.getChat().sendMessage(
    {
      message: { text: 'First, reason in your scratchpad. Then, generate the <state_snapshot>.' },
      config: { systemInstruction: { text: getCompressionPrompt() } }
    },
    prompt_id
  );
  this.chat = await this.startChat([
    { role: 'user', parts: [{ text: summary }] },
    { role: 'model', parts: [{ text: 'Got it. Thanks for the additional context!' }] },
    ...historyToKeep
  ]);
}

3.6 Multi‑Layer Compression

Layer 1 filters invalid content; Layer 2 merges adjacent similar parts; Layer 3 triggers LLM‑generated structured summaries when token limits are exceeded; Layer 4 protects critical recent information.

3.7 Model‑Specific Token Limits

export function tokenLimit(model) {
  switch (model) {
    case 'gemini-1.5-pro':
      return 2_097_152;
    case 'gemini-1.5-flash':
    case 'gemini-2.5-pro':
    case 'gemini-2.5-flash':
    case 'gemini-2.0-flash':
      return 1_048_576;
    case 'gemini-2.0-flash-preview-image-generation':
      return 32_000;
    default:
      return DEFAULT_TOKEN_LIMIT; // 1_048_576
  }
}

3.8 User‑Experience‑Focused Design

Invisible compression : the 70% trigger compresses before users notice any slowdown.

Continuity preservation : the latest 30% of history remains untouched to keep the conversation coherent.

Transparent feedback : token count before and after compression is logged for visibility.

4. Conclusion

Choosing a compression strategy is fundamentally answering "what is important?" Manus opts for never losing data, Claude Code favors structured abstracts, and Gemini CLI prioritizes user experience. No single approach is universally best; developers must understand each philosophy and select the one that fits their scenario.

AI LLM Agent design token management Context Compression

Written by

Architecture and Beyond

Focused on AIGC SaaS technical architecture and tech team management, sharing insights on architecture, development efficiency, team leadership, startup technology choices, large‑scale website design, and high‑performance, highly‑available, scalable solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.