Artificial Intelligence 24 min read

Agent Harness Context: Chat Log vs. Workset – How Runtime Management Shapes Long‑Running Agents

The article argues that an agent harness’s context window should be treated as a bounded workset rather than an ever‑growing transcript, and explains how pagination, compression, tool‑output limits, session isolation, and sub‑agent design together determine whether long‑running agents remain reliable and efficient.

Architect

Apr 28, 2026

Agent Harness Context: Chat Log vs. Workset – How Runtime Management Shapes Long‑Running Agents

Context Window as a Fixed‑Size Workset

In long‑running agents the context window should be treated as a limited workset that is curated before each model call. The harness builds a "current usable" view by deciding which items stay near the model, which are compressed, and which are moved out for later retrieval.

Boundaries that must be distinguished include short‑term windows, persistent sessions, tool execution, memory files, and sub‑task workspaces. Clear separation tells the system which layer performs reasoning, which restores state, and which audits actions.

File‑Reading Strategies

Four open‑source harnesses (Pi, OpenClaw, Claude Code, Letta Code) converge on the same pattern:

Hard caps on file size (e.g., Pi ≤ 2000 lines or 50 KB; Claude Code ≤ 256 KB; Letta Code ≤ 10 MB).

Offset/limit pagination to let the model request the next chunk.

Preview‑only returns for large files, with a path or offset for the model to continue reading.

The harness, not the model, enforces these limits and teaches the model to read incrementally.

Tool‑Output Management

Tool outputs can inflate the context faster than file reads. All examined harnesses apply:

Character or token caps per tool (e.g., OpenClaw ≤ 16000 chars or 30 % of the window, whichever is smaller).

Preview of large outputs, with the full data offloaded to disk or a service.

Providing the model a path, ID, or retrieval tool to fetch the full result later.

Deduplication of repeated calls.

Claude Code’s pre‑query optimization and Alyx’s compressed preview + server‑side copy illustrate this pattern.

Session Compression

Compression must preserve task state (goals, files, errors, next steps) so the model can continue without loss.

Pi : Triggers a deterministic LLM summary when a token threshold is crossed, keeping recent messages and respecting tool‑call/result boundaries.

OpenClaw : Slices history by quality, discards old chunks, runs a silent agentic turn to write key state to a memory file before summarising.

Claude Code : Uses a structured prompt extracting primary request, technical concepts, files, errors, pending tasks, and current work, then re‑adds recently read files.

Letta Code : Server‑side compaction combined with a reflection sub‑agent that updates a git‑backed memory repository.

Compression itself can exceed the window; robust harnesses include fallback paths such as head‑drop or intermediate truncation.

Sub‑Agent Isolation

Sub‑agents receive only the delegated task string or a fresh session, preventing parent transcript pollution:

Pi spawns a new process with just the task string.

OpenClaw starts a fresh isolated session without parent history.

Claude Code creates a typed‑agent with an empty dialogue and a single delegation prompt.

Letta Code separates forked and non‑forked sub‑agents, the latter being headless.

Isolation reduces context bloat and simplifies debugging, but delegation must be explicit to avoid missing necessary context.

Converging Memory‑Management Hierarchy

Across file reading, tool output, session compression, and sub‑agents the systems form a hierarchy analogous to CPU caches:

Hard caps.

Pagination / offset‑limit.

Overflow to persistent storage (disk, memory repo, vector index).

Anthropic’s Managed Agents place the session outside the harness as a durable event log, while Letta’s approach makes the memory filesystem version‑controlled and directly accessible to the agent.

Self‑Check Checklist for a Harness

Do tools that can return large content have hard limits?

Do they expose a path or ID for continued access after truncation?

Are pagination parameters (offset, limit, query, range) documented in the tool schema?

What state does session compression retain beyond a plain summary?

Does compression preserve tool‑call / tool‑result boundaries?

Are sub‑agents isolated by default?

Is stable state migrated to a persistent layer (files, DB, index, memory repo)?

Is the system observable (token usage, truncation events, compaction triggers, summary coverage)?

Are session, harness, and sandbox decoupled?

Even if not all items are satisfied initially, implementing file‑read limits and tool‑output budgeting eliminates many common pitfalls.

Key Insight

Agent competition is shifting from "how far a model can think" to "how well the system can keep the model in a reliable long‑running loop". Effective context management—curating the workset, paging large data, previewing tool results, compressing history, and isolating sub‑agents—has become a core module of the agent runtime.

Code example

相关阅读：

LLM Compression Context Management Session Sub‑Agent Agent Harness Tool Output

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.