Artificial Intelligence 14 min read

Managing LLM Agent Context: Insights from OpenManus, Manus, Claude Code & Gemini-cli

This article examines why context management is critical for LLM agents, compares the strategies of OpenManus, Manus, Claude Code, and Gemini-cli, and extracts practical lessons on token limits, compression techniques, and engineering trade‑offs for building efficient, cost‑effective AI systems.

Architecture and Beyond

Aug 31, 2025

Managing LLM Agent Context: Insights from OpenManus, Manus, Claude Code & Gemini-cli

For developers working with large language models (LLMs), context management is a core problem that determines both the intelligence of the AI and the system's performance and cost.

Simple strategies that continuously accumulate dialogue history quickly hit token limits and raise API costs, so technical leaders building AI agents must balance performance and expense.

OpenManus Context Management

OpenManus uses a straightforward approach:

Lightweight message list mechanism

Fixed‑length list (default 100 messages) stored in memory

FIFO truncation when the limit is exceeded

No intelligent compression or summarisation

Token limit handling

Hard token check; exceeding the limit throws an exception

Lacks graceful degradation or adaptive window cropping

Prone to hitting limits in long or tool‑heavy conversations

While simple, OpenManus offers custom handling for specific scenarios (e.g., injecting browser state), but it is a prototype and not suited for production without further refinement.

Manus Context Management

Manus treats the file system as the ultimate context store instead of relying on in‑memory management.

Unlimited capacity : the file system size is not constrained

Native persistence : data is automatically saved and never lost

Direct manipulation : agents can read/write files actively

Structured memory : provides an external, structured memory system

Rather than storing full observations, Manus keeps only references (e.g., Document X, File Y) and can restore the full information from the file system when needed, achieving recoverable information compression.

Implementation details include removing web content from context and keeping only URLs, omitting document bodies and retaining file paths, and ensuring no permanent loss of information.

Claude Code Context Management

Claude Code is not open source, but reverse‑engineered analysis reveals several clever mechanisms:

TodoWrite Tool

Introduces a self‑maintained To‑Do list, replacing traditional multi‑agent division.

Focus: prompts repeatedly remind the model to consult the To‑Do list.

Flexibility: an "interleaved thinking" mechanism allows dynamic addition/removal of tasks.

Transparency: users can view plans and progress in real time.

Reverse Token Traversal

Statistics are gathered from the latest assistant reply, turning a potential O(n) scan into O(k) and dramatically improving performance in high‑frequency calls.

92% Threshold

A 8% buffer ensures compression has time to finish and provides a fallback if quality is insufficient.

8‑Section Structured Summary

1. Primary Request and Intent - 主要请求和意图
2. Key Technical Concepts - 关键技术概念
3. Files and Code Sections - 文件和代码片段
4. Errors and Fixes - 错误和修复
5. Problem Solving - 问题解决过程
6. All User Messages - 所有用户消息
7. Pending Tasks - 待处理任务
8. Current Work - 当前工作状态

Graceful Degradation

If compression fails, Claude Code employs a hierarchy of fallback plans (Plan B, Plan C) that re‑compress, mix retention, or conservatively truncate, preserving user experience.

Vectorised Search

A long‑term memory layer uses vector search to recall similar past queries, enabling cross‑session knowledge transfer.

Gemini‑cli Context Management

Gemini‑cli follows a similar but lighter philosophy, treating the file system as a natural database.

Three‑Layer Hybrid Storage

Layer 1: In‑Memory Workspace

Stores current session chat history, tool call state, loop detection state

Zero‑latency access, no I/O

Cleared when the session ends

Layer 2: Smart Compression Layer

Trigger threshold: 70% (more conservative than Claude Code’s 92%)

Retention policy: keep the latest 30% of dialogue

Compression output: a 5‑section structured summary

1. overall_goal - 用户的主要目标
2. key_knowledge - 重要技术知识和决策
3. file_system_state - 文件系统当前状态
4. recent_actions - 最近执行的重要操作
5. current_plan - 当前执行计划

Layer 3: File‑System Persistence

Global memory: ~/.gemini/GEMINI.md Project memory: recursively search up to the project root

Sub‑directory context: scan downwards respecting ignore rules

Ignore Rules

The .geminiignore mechanism works independently of .gitignore, can operate outside Git repos, and each tool has its own toggle. Changes require a session restart, which is a feature to avoid runtime state chaos.

Design Philosophy

Gemini‑cli embraces "good enough": it does not chase theoretical optimal compression ratios or complex vector retrieval, but solves ~80% of problems with a simple, maintainable solution, reducing bugs and easing onboarding.

Conclusion

Context is the boundary of intelligence; compression is the art of performance. Smart systems remember what matters instead of everything. Claude Code’s three‑layer memory, TodoWrite tool, token‑reverse traversal, 92% threshold, 8‑section summary, and graceful degradation illustrate a robust context ecosystem. Gemini‑cli’s pragmatic 70/30 strategy, 5‑section summary, and file‑system‑as‑DB approach demonstrate that simplicity often wins in engineering practice.

AI LLM Compression context

Written by

Architecture and Beyond

Focused on AIGC SaaS technical architecture and tech team management, sharing insights on architecture, development efficiency, team leadership, startup technology choices, large‑scale website design, and high‑performance, highly‑available, scalable solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.