Artificial Intelligence 29 min read

How OpenClaw Manages Context: Multi‑Layer Compression, Memory Persistence, and Overflow Recovery

This article explains OpenClaw's sophisticated context‑management system, detailing its three‑layer approach to pruning old turns, trimming tool results, and handling oversized outputs, while preserving critical state through memory flushing, structured compaction, and a robust overflow‑recovery pipeline.

Architect

Mar 11, 2026

How OpenClaw Manages Context: Multi‑Layer Compression, Memory Persistence, and Overflow Recovery

Overview

OpenClaw faces two core challenges: security boundaries (whether it can accept real permissions) and context cost (whether it can finish long tasks). Unlike simple "window is full, summarize" ideas, OpenClaw treats context compression as a controllable execution chain.

Context stability ≈ window pre‑check + history hygiene + pair repair + compression retry + timeout snapshot + overflow recovery

TL;DR (9 points)

OpenClaw handles three simultaneous problems: accumulated old rounds, bloated tool results, and single massive outputs.

First layer is preventive trimming: limit history rounds, progressively trim old tool results, and cap each tool result to 30% of the context.

Second layer (Compaction) is a full pipeline with pre‑trimming, staged summarization, failure fallback, and structured supplemental info, preceded by a silent memory flush.

Third layer treats overflow as a normal fault path, with explicit retry, persistent truncation, and finally a user‑visible reset.

Key invariants protected: recent dialogue, tool_use‑tool_result pairing, file read/write traces, tool failure logs, and critical rules from AGENTS.md.

Transcript Hygiene automatically fixes pairing issues per provider (Anthropic, Google, OpenAI) before compression.

Memory is persisted as memory/YYYY‑MM‑DD.md and searchable via vector index (BM25 + vector).

Overflow recovery follows compact → truncate → readable error with snapshot fallback.

Engineering judgments focus on progressive degradation, protecting invariants, aligning with provider cache TTL, and cancelling on bad summaries.

Key Concepts

Context includes system prompts, conversation history, tool calls and results, attachments, and any compressed summaries. It is distinct from persistent memory , which lives on disk.

Compaction writes a summarized version back to the session’s JSONL record.

Session Pruning temporarily trims old tool results in memory before each call; it never rewrites the stored JSONL history.

Transcript Hygiene cleans up provider‑specific formatting issues (tool IDs, pairing, ordering) without altering the disk record.

First Layer – Preventive Trimming

1. History Turn Limit

Only the most recent N user turns (and their linked assistant/tool messages) are kept. Truncation always occurs at a complete user → assistant → tool_result boundary to avoid breaking session structure.

2. Context Pruning

Two modes:

Soft Trim (triggered when estimated tokens exceed 30% of the window): keep the first 1500 and last 1500 characters of tool results longer than 4000 characters, inserting ... with size annotation.

Hard Clear (triggered at 50%): replace the entire old tool result with [Old tool result content cleared].

Additional protection rules:

Tool results before the first user message are never trimmed (they contain identity files, workspace rules, etc.).

The most recent three assistant‑related tool results are kept untouched.

Image results are never trimmed.

Pruning uses a 5‑minute TTL aligned with Anthropic’s cache retention.

3. Single Tool Result Truncation

Any single tool result may occupy at most 30% of the context window, capped at 400 000 characters. Excess content is truncated with a notice prompting the model to request more via offset or limit.

Second Layer – Compaction Pipeline

0. Memory Flush (pre‑compression)

Before compression, a silent memoryFlush round writes the current persistent state to memory/YYYY‑MM‑DD.md, ensuring critical information isn’t lost during summarization.

1. Fact Collection

Compaction first gathers essential facts for later steps: read files, modified files, recent tool failures, and workspace rules.

2. Pre‑pruning History

If the content to be compressed is still too large, pruneHistoryForContextShare splits old messages into chunks, discards the oldest chunk, and creates a droppedSummary for it, which is fed into the main compression flow.

3. Staged Summarization

Large histories are summarized in stages:

Split messages into token‑based chunks.

Summarize each chunk independently.

Combine partial summaries into a final summary.

This reduces the risk of exceeding the model’s token limit and allows graceful retries.

4. Adaptive Chunk Size

Chunk size is dynamic: default max 40% of the context window, shrinking to ~15% for very large messages, with a safety factor of 1.2.

5. Multi‑Level Fallback

If full summarization fails, OpenClaw falls back:

Attempt normal full‑size summary.

If that fails, omit the oversized message, summarize the rest, and note the omission.

If still failing, return a readable error indicating the context is too large.

6. Structured Patch Output

The final summary includes structured sections such as Tool Failures, <read-files>, <modified-files>, and <workspace-critical-rules>, preserving high‑value state that plain natural‑language summaries would lose.

Third Layer – Overflow Recovery

1. Detect Overflow

Two signals trigger overflow handling: provider rejects the prompt before submission, or the assistant returns a context‑too‑large error during generation.

2. Recovery Sequence

OpenClaw follows compact → truncate → readable error:

If SDK‑initiated compaction already ran, retry it.

If not, trigger an explicit compaction.

Limit explicit compaction attempts to three.

If still failing, check for an oversized tool result and apply persistent truncation.

If all else fails, suggest /reset or a larger‑window model.

3. Timeout Snapshot

When compression times out, OpenClaw returns the pre‑compression snapshot instead of a partially compressed transcript, ensuring consistency.

4. Persistent Truncation with Branching

Rather than editing the original session JSON in place, OpenClaw creates a new branch from the point of truncation and appends subsequent entries, preserving an append‑only audit trail similar to Git branching.

Engineering Judgments

Progressive degradation (light trimming → compaction → persistent truncation) keeps cost low and avoids irreversible loss.

Protect invariants (recent memory, tool pairing, file history, failure logs, workspace rules) rather than trying to keep every token.

Align pruning TTL with provider cache TTL to minimize cache churn and cost.

Cancel on bad summaries; never write a low‑quality summary that could corrupt future decisions.

Practical Checklist (6 points)

Separate context problems into old rounds, old tool results, and single massive outputs.

Apply progressive trimming to old tool results instead of blunt removal.

Protect recent short‑term memory from being trimmed.

Ensure compaction output includes structured patches (failures, file traces, rules).

Design a clear overflow‑recovery path.

When persisting history changes, use branching rather than in‑place overwrite.

Key Configuration Parameters

agents.defaults.contextTokens

: default 200 000 tokens (model capacity). agents.defaults.compaction.reserveTokens: 20 000 tokens reserved for AI replies. agents.defaults.compaction.keepRecentTokens: keep last 40 000 tokens uncompressed. agents.defaults.contextPruning.ttl: 5 minutes, aligned with Anthropic cache. agents.defaults.contextPruning.softTrimRatio: 0.3 (soft trim threshold). agents.defaults.contextPruning.hardClearRatio: 0.5 (hard clear threshold).

Comparison with Dakou

Both systems address context management, but OpenClaw emphasizes layered degradation, provider‑specific transcript hygiene, and cache‑aligned pruning, while Dakou focuses on asynchronous compression goroutines and XML‑structured outputs.

Conclusion

Effective context management for agents is less about adding tokens and more about deciding which information must stay pristine, which can be lossy‑compressed, and how to recover from overflow failures. OpenClaw’s three‑layer design—guard, compress, recover—offers a robust blueprint for building stable, cost‑effective LLM agents.

Compression LLM agents memory persistence overflow recovery

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.