Artificial Intelligence 19 min read

AI Agent Context Management: Comparing Six Major Companies' Approaches

The article analyzes how six leading AI‑agent providers—Manus, Cursor, Anthropic, OpenAI, Google, and LangChain—tackle the fundamental problem of when and how a large language model should see information, detailing each solution, a cross‑company comparison matrix, consensus points, controversies, and open research questions.

AI Tech Publishing

Mar 4, 2026

AI Agent Context Management: Comparing Six Major Companies' Approaches

Problem Background

All companies face the same constraint: the context window is limited while the tokens generated by an agent grow exponentially. A typical task involves about 50 tool calls, each adding observations to the context, quickly filling the window and causing performance degradation—a phenomenon called "context rot". Different names are used (Anthropic’s "attention budget", LangChain’s "context window = RAM"), but the conclusion is consistent: smarter context management matters more than a larger window.

Manus: Six Production Principles

Background

Manus serves millions of users; a typical task averages 50 tool calls with an input‑to‑output token ratio of 100:1.

Six Principles

KV‑Cache is sacred. Cached token cost is 3/MTok, uncached is ten times higher. Keep the prompt prefix stable and only append logs; even reordering JSON keys invalidates the cache.

Use Logit masking instead of removing tools. All tools stay loaded permanently; availability is controlled by constraining output token probabilities during decoding, keeping the context stable.

Filesystem as extended memory. Large observations are written to files; only lightweight references stay in the context. Compression is acceptable as long as it is reversible.

Recite to focus attention. A "live" to‑do list is updated each step and reread, placing the current goal in the high‑attention region (the end of the context).

Retain errors instead of cleaning them. Failed operations remain in the context for implicit belief updates, reducing repeated mistakes.

Structure changes to avoid solidification. Different iterations use different serialization templates and phrasing to prevent the model from falling into rigid repetitive patterns.

Cursor: Dynamic Context Discovery

Background

Cursor’s January 2026 research blog describes five techniques and shows that, as models improve, providing fewer details and letting the agent pull context itself yields better results, supported by A/B test data.

Five Techniques

File as tool output interface. Large JSON responses are written to files; the agent incrementally reads them with tail / grep without unnecessary summarization.

Chat‑history file for lossless compression. Full history is saved to a file before summarization, allowing the agent to recover any lost detail—turning lossy compression into lossless.

Skills as discoverable files. Domain capabilities are stored as files and discovered via search instead of being pre‑loaded into the system prompt.

Lazy‑load MCP tools. Only tool names are pre‑loaded; full definitions are fetched on demand, reducing token usage by 46.9% in A/B tests.

Terminal session as file. Shell history becomes a searchable file; the agent greps needed content.

The core assumption is that the model is now strong enough to know which context it needs.

Anthropic: Attention‑Budget Framework

Background

Anthropic released a foundational context‑engineering framework in September 2025, followed by deeper explorations of long‑running agents (January 2026) and MCP‑based code execution (November 2025), all built on Claude Code.

Core Strategy

System prompt’s “golden girl” zone. Two failure modes were identified: over‑engineered prompts (2K+ word if‑else logic that crashes on edge cases) and vague prompts like "be helpful" that leave the model directionless. The solution is to organize prompts into clear sections (XML tags or markdown headings) with typical examples, letting the model handle edge cases instead of hard‑coding.

Instant retrieval. Agents retrieve context at runtime based on actual need, shifting from pre‑inference RAG to in‑loop retrieval.

Non‑overlapping concise tools. Tools must be self‑contained and unambiguous; if a human engineer cannot decide which tool to use, the model cannot either.

95% compression. Claude Code automatically summarizes when the window reaches 95% capacity. For long‑running agents, an initialization agent writes a comprehensive requirement file (200+ features) that persists across windows.

Code execution over direct tool calls. For multi‑server MCP, the agent writes code that calls tools, keeping definitions in the filesystem.

OpenAI: Session Memory as Infrastructure

Background

OpenAI’s approach is documented in their Agents SDK and two detailed cookbooks (short‑term session memory, September 2025; long‑term personalized context, December 2025). The contribution is framework‑oriented, offering patterns developers can adopt directly.

Three Modes

Truncation. Delete older rounds, keep the last N. Simple, deterministic, zero latency, but early constraints are forgotten.

Compression. Summarize earlier history with a separate model call; the summary acts as a "clean room" that can correct past errors. Risk: summary drift.

State‑based long‑term memory. Structured state objects (profile + notes) persist across sessions. Each run extracts memories, merges notes, and injects state with priority order: latest input → session → global defaults. OpenAI contrasts retrieval‑based memory (document search) with state‑based memory (structured fields), noting that state‑based memory supports belief updates and is more reliable.

Google: Long‑Context Bet

Background

Google’s solution differs by betting on abundant context: Gemini models provide up to 2 M tokens (tests up to 10 M). Their ReadAgent paper (2024) offers a complementary view on memory compression.

Solution

"Put everything in. Default is to fill the context window; RAG and summarization are workarounds for limited‑context models. Evidence: Gemini learned to translate Kalamang (fewer than 200 speakers) using only context material.

Context cache. A caching API reduces cost by up to 75%, similar to Manus’s KV‑cache optimization.

Progressive truncation. Compress earlier context while preserving logical threads.

ReadAgent – Gist Memory (research). Compress interactions into a "gist" memory; original text is retrieved on demand, increasing effective context by 20× and mimicking human reading of long documents.

Multi‑sample context learning. Leverage huge context windows by placing hundreds or thousands of samples in‑context, achieving performance comparable to fine‑tuning.

Long context does not eliminate context engineering; it reshapes it. Studies still show a 15‑47% performance drop as context length grows.

LangChain: Framework Taxonomy

Background

LangChain contributes a taxonomy, organizing others’ approaches into a coherent framework based on their LangGraph implementation and "Deep Agents" analysis.

Four Operations

Write – store context outside the window. Draft notebooks, persistent state objects, filesystem storage. Example: Anthropic’s multi‑agent researcher stores plans in memory because contexts over 200 K tokens are truncated.

Pull – fetch relevant context. RAG, semantic search, filesystem traversal (grep/glob). The challenge is retrieving the right context at the right time, not just the most semantically similar.

Compress – keep only necessary tokens. Dialogue summarization, tool‑output compression. LangChain measured end‑to‑end token reduction from 115 K to 60 K.

Isolate – split context across agents. In multi‑agent architectures, sub‑agents have separate windows, preventing "context pollution" from unrelated details.

No‑op tools as context engineering. Their "Deep Agents" analysis found Claude Code’s to‑do‑list tool does nothing functionally but forces the agent to articulate its plan, keeping the trajectory on track in long runs.