Managing LLM Agent Context: Insights from OpenManus, Manus, Claude Code & Gemini-cli
This article examines why context management is critical for LLM agents, compares the strategies of OpenManus, Manus, Claude Code, and Gemini-cli, and extracts practical lessons on token limits, compression techniques, and engineering trade‑offs for building efficient, cost‑effective AI systems.
For developers working with large language models (LLMs), context management is a core problem that determines both the intelligence of the AI and the system's performance and cost.
Simple strategies that continuously accumulate dialogue history quickly hit token limits and raise API costs, so technical leaders building AI agents must balance performance and expense.
OpenManus Context Management
OpenManus uses a straightforward approach:
Lightweight message list mechanism
Fixed‑length list (default 100 messages) stored in memory
FIFO truncation when the limit is exceeded
No intelligent compression or summarisation
Token limit handling
Hard token check; exceeding the limit throws an exception
Lacks graceful degradation or adaptive window cropping
Prone to hitting limits in long or tool‑heavy conversations
While simple, OpenManus offers custom handling for specific scenarios (e.g., injecting browser state), but it is a prototype and not suited for production without further refinement.
Manus Context Management
Manus treats the file system as the ultimate context store instead of relying on in‑memory management.
Unlimited capacity : the file system size is not constrained
Native persistence : data is automatically saved and never lost
Direct manipulation : agents can read/write files actively
Structured memory : provides an external, structured memory system
Rather than storing full observations, Manus keeps only references (e.g., Document X, File Y) and can restore the full information from the file system when needed, achieving recoverable information compression.
Implementation details include removing web content from context and keeping only URLs, omitting document bodies and retaining file paths, and ensuring no permanent loss of information.
Claude Code Context Management
Claude Code is not open source, but reverse‑engineered analysis reveals several clever mechanisms:
TodoWrite Tool
Introduces a self‑maintained To‑Do list, replacing traditional multi‑agent division.
Focus: prompts repeatedly remind the model to consult the To‑Do list.
Flexibility: an "interleaved thinking" mechanism allows dynamic addition/removal of tasks.
Transparency: users can view plans and progress in real time.
Reverse Token Traversal
Statistics are gathered from the latest assistant reply, turning a potential O(n) scan into O(k) and dramatically improving performance in high‑frequency calls.
92% Threshold
A 8% buffer ensures compression has time to finish and provides a fallback if quality is insufficient.
8‑Section Structured Summary
1. Primary Request and Intent - 主要请求和意图
2. Key Technical Concepts - 关键技术概念
3. Files and Code Sections - 文件和代码片段
4. Errors and Fixes - 错误和修复
5. Problem Solving - 问题解决过程
6. All User Messages - 所有用户消息
7. Pending Tasks - 待处理任务
8. Current Work - 当前工作状态Graceful Degradation
If compression fails, Claude Code employs a hierarchy of fallback plans (Plan B, Plan C) that re‑compress, mix retention, or conservatively truncate, preserving user experience.
Vectorised Search
A long‑term memory layer uses vector search to recall similar past queries, enabling cross‑session knowledge transfer.
Gemini‑cli Context Management
Gemini‑cli follows a similar but lighter philosophy, treating the file system as a natural database.
Three‑Layer Hybrid Storage
Layer 1: In‑Memory Workspace
Stores current session chat history, tool call state, loop detection state
Zero‑latency access, no I/O
Cleared when the session ends
Layer 2: Smart Compression Layer
Trigger threshold: 70% (more conservative than Claude Code’s 92%)
Retention policy: keep the latest 30% of dialogue
Compression output: a 5‑section structured summary
1. overall_goal - 用户的主要目标
2. key_knowledge - 重要技术知识和决策
3. file_system_state - 文件系统当前状态
4. recent_actions - 最近执行的重要操作
5. current_plan - 当前执行计划Layer 3: File‑System Persistence
Global memory: ~/.gemini/GEMINI.md Project memory: recursively search up to the project root
Sub‑directory context: scan downwards respecting ignore rules
Ignore Rules
The .geminiignore mechanism works independently of .gitignore, can operate outside Git repos, and each tool has its own toggle. Changes require a session restart, which is a feature to avoid runtime state chaos.
Design Philosophy
Gemini‑cli embraces "good enough": it does not chase theoretical optimal compression ratios or complex vector retrieval, but solves ~80% of problems with a simple, maintainable solution, reducing bugs and easing onboarding.
Conclusion
Context is the boundary of intelligence; compression is the art of performance. Smart systems remember what matters instead of everything. Claude Code’s three‑layer memory, TodoWrite tool, token‑reverse traversal, 92% threshold, 8‑section summary, and graceful degradation illustrate a robust context ecosystem. Gemini‑cli’s pragmatic 70/30 strategy, 5‑section summary, and file‑system‑as‑DB approach demonstrate that simplicity often wins in engineering practice.
Architecture and Beyond
Focused on AIGC SaaS technical architecture and tech team management, sharing insights on architecture, development efficiency, team leadership, startup technology choices, large‑scale website design, and high‑performance, highly‑available, scalable solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
