How OpenClaw Uses a Multi‑Layer Defense System to Prevent LLM Context Overflow

The article provides a detailed technical walkthrough of OpenClaw's three‑stage context‑management framework—including pre‑emptive pruning, LLM‑driven compaction, and overflow‑recovery truncation—showing how each layer protects long‑running AI agent sessions from exceeding token windows while preserving essential information.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
How OpenClaw Uses a Multi‑Layer Defense System to Prevent LLM Context Overflow

Overall Architecture

OpenClaw mitigates context‑window overflow in long AI‑agent conversations through a three‑layer defense: pre‑emptive pruning, LLM‑based compaction, and overflow‑recovery truncation.

Repository: github.com/openclaw/openclaw
OpenClaw architecture diagram
OpenClaw architecture diagram

Layer 1 – Pre‑emptive Pruning (History Turn Limit)

Limits the number of retained user turns. The function limitHistoryTurns(messages, limit) walks the message list from the end, counts user messages and discards older turns once the limit is exceeded, ensuring cuts occur at complete user‑assistant‑toolResult triples.

export function limitHistoryTurns(messages: AgentMessage[], limit: number | undefined): AgentMessage[] {
  if (!limit || limit <= 0 || messages.length === 0) {
    return messages;
  }
  let userCount = 0;
  let lastUserIndex = messages.length;
  for (let i = messages.length - 1; i >= 0; i--) {
    if (messages[i].role === "user") {
      userCount++;
      if (userCount > limit) {
        return messages.slice(lastUserIndex);
      }
      lastUserIndex = i;
    }
  }
  return messages;
}

The limit is read from configuration paths such as channels.*.dmHistoryLimit (DM) and channels.*.historyLimit (group), allowing per‑user or per‑channel overrides.

Layer 2 – Compaction (LLM‑Generated Summary)

When the accumulated context approaches the model’s token window, OpenClaw triggers an LLM call that summarizes the conversation history. The summary replaces the original messages and is augmented with structured metadata (tool failures, file operation logs, workspace rules). src/agents/compaction.ts – summary algorithm src/agents/pi-extensions/compaction-safeguard.ts – coordination extension src/agents/pi-embedded-runner/compact.ts – entry point

Compaction pipeline:

Adaptive chunk sizing ( computeAdaptiveChunkRatio).

Chunk‑wise summarization with summarizeWithFallback, which falls back through three levels if a chunk is too large.

Merging partial summaries using a dedicated instruction set.

Appending tool‑failure lists, file operation logs, and critical workspace rules.

Repairing tool_use / tool_result pairings for Anthropic API compliance.

All summary calls are wrapped with retryAsync (up to 3 attempts, exponential back‑off) and a global timeout ( EMBEDDED_COMPACTION_TIMEOUT_MS) to avoid hanging.

Layer 3 – Overflow Recovery (Tool Result Truncation)

If compaction still cannot fit the prompt, OpenClaw performs a final truncation of oversized tool results. The maximum per‑tool size is limited to 30 % of the context window, capped at 400 KB characters, with a minimum keep region of 2 KB. Truncation prefers preserving the head and tail of the result.

export function truncateToolResultText(text, maxChars, options) {
  const keepChars = Math.max(minKeepChars, maxChars - suffix.length);
  let cutPoint = keepChars;
  const lastNewline = text.lastIndexOf("
", keepChars);
  if (lastNewline > keepChars * 0.8) {
    cutPoint = lastNewline;
  }
  return text.slice(0, cutPoint) + suffix;
}

The truncation is attempted only once per session (flag toolResultTruncationAttempted) to avoid repeated modifications of the persistent session file.

Token Estimation Strategy

OpenClaw uses a simple heuristic chars / 4 ≈ tokens for all providers, multiplied by a safety margin of 1.2. Functions such as chunkMessagesByMaxTokens and maxHistoryTokens apply this factor before splitting or pruning messages.

const effectiveMax = Math.max(1, Math.floor(maxTokens / SAFETY_MARGIN));

Configuration Overview

agents.defaults.contextTokens

– overrides model‑specific context window. agents.defaults.compaction.reserveTokens (default 20 000) – tokens reserved for the next reply after compaction. agents.defaults.contextPruning.mode – "cache‑ttl" (default) enables TTL‑based pruning. agents.defaults.contextPruning.ttl – 5 minutes, aligned with provider cache retention. agents.defaults.contextPruning.keepLastAssistants – protects the most recent 3 assistant messages. agents.defaults.contextPruning.softTrimRatio (0.3) and hardClearRatio (0.5) – trigger thresholds for soft trim and hard clear. agents.defaults.contextPruning.minPrunableToolChars (50 000) – minimum total size before hard clear is considered.

Cache Impact Analysis

OpenClaw aligns its TTL‑based pruning with provider cache retention (e.g., Anthropic’s 5 minute cache). Pruning only occurs after the cache expires, avoiding unnecessary cache‑miss penalties. Compaction rebuilds the prompt, causing a full cache miss, but the resulting prompt is dramatically shorter, reducing overall cost.

Recovery Decision Tree

if (context overflow detected) {
  if (SDK already performed auto‑compact) {
    overflowCompactionAttempts++;
    retry prompt without extra compaction;
  } else if (overflowCompactionAttempts < 3) {
    trigger explicit compaction ("overflow");
    if (success) retry prompt; else fallback;
  } else {
    if (sessionLikelyHasOversizedToolResults()) {
      truncateOversizedToolResultsInSession();
      if (success) retry prompt; else give up;
    } else {
      give up and suggest /reset;
    }
  }
}

Compaction attempts are capped at three; tool‑result truncation is attempted only once per session.

Core Design Principles

Progressive degradation: start with lightweight pruning, then LLM summarization, and finally hard truncation.

Protection of critical data: never prune the first user message, recent assistant messages, or tool results containing images.

Safety first: strip untrusted toolResult.details, repair broken tool_use / tool_result pairings, and cancel compaction on any exception to keep the original history.

Recoverability: multiple fallback layers ensure the system only gives up after exhausting all safe options.

Minimal invasiveness: persistent session modifications are performed via branching (append‑only), and in‑memory changes are never written back unless necessary.

Additional Technical Details

Maximum tool result size is limited by:

MAX_TOOL_RESULT_CONTEXT_SHARE = 0.3; // ≤30 % of context window
HARD_MAX_TOOL_RESULT_CHARS = 400_000; // ≤400 KB characters
MIN_KEEP_CHARS = 2_000; // keep at least 2 KB when truncating

Calculation of the per‑tool character budget:

export function calculateMaxToolResultChars(contextWindowTokens: number): number {
  const maxTokens = Math.floor(contextWindowTokens * MAX_TOOL_RESULT_CONTEXT_SHARE);
  const maxChars = maxTokens * 4; // 1 token ≈ 4 chars
  return Math.min(maxChars, HARD_MAX_TOOL_RESULT_CHARS);
}

Compaction summary output includes the dialogue summary followed by structured sections for tool failures, read files, modified files, and workspace‑critical rules, ensuring the agent retains essential context after compression.

LLMcompactioncache optimizationContext ManagementOpenClawtoken-overflowtool-truncation
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.