Artificial Intelligence 13 min read

How to Future‑Proof Agent Systems by Virtualizing Sessions, Harnesses, and Sandboxes

The article analyzes Anthropic's Managed Agents design, showing how OS‑style virtualization of core components—Session, Harness, and Sandbox—creates stable interfaces that keep agent systems functional as model capabilities evolve, improve security, and boost performance.

Shi's AI Notebook

Apr 11, 2026

How to Future‑Proof Agent Systems by Virtualizing Sessions, Harnesses, and Sandboxes

Background and Core Problem

Anthropic’s engineering blog "Scaling Managed Agents: Decoupling the brain from the hands" explains that early harness designs hard‑code assumptions about what a model cannot do. When Claude Sonnet 4.5 exhibited "context anxiety" and required a context‑reset hack, the same code became unnecessary for Opus 4.5, illustrating how model‑specific patches turn into technical debt.

OS Analogy: Virtualizing Agent Components

The solution draws on operating‑system thinking: abstract hardware into generic interfaces such as process, file, and read(). Managed Agents apply the same principle by virtualizing three core components:

Session : an append‑only event log, analogous to a disk.

Harness : the loop that calls Claude and routes tool calls, analogous to a process.

Sandbox : the environment where Claude executes code and edits files, analogous to a peripheral device.

The key is stable, replaceable interfaces so that Harness no longer assumes sandbox capabilities and Session no longer assumes Claude’s context limits.

Stable Interfaces

The design defines the following interfaces (all invoked by the Harness): provision({resources}) – Initializes a new container (clones repo, starts process) and is triggered on sandbox failure. execute(name, input) → string – Calls the sandbox as a tool; returns a string or a tool‑call error. emitEvent(id, event) – Writes an event to the Session log, providing a persistent execution record. getEvents() – Slices the event stream for replay, rewind, or context inspection. getSession(id) – Retrieves the full event log for a wake operation. wake(sessionId) – Orchestration creates a new stateless Harness and binds it to the specified Session, allowing seamless recovery after a crash.

These calls make the sandbox the callee and the Harness the caller, moving all state to the external Session.

From “Pets” to “Cattle”

Initially, Session, Harness, and Sandbox lived in a single container (a “pet”). This caused several issues:

Container crash lost the Session.

Debugging required shell access to a container that also held user data.

Harness and tools were tightly coupled, limiting VPC access.

Full container initialization added significant TTFT latency.

By turning each component into interchangeable “cattle”, the architecture gains:

Sandbox becomes a stateless tool ( execute); Harness can retry failed tool calls and reprovision a fresh sandbox.

Harness becomes a stateless process; its state lives in Session, and wake / getSession restore it.

Session persists externally, so container turnover no longer discards conversation state.

Security Boundary

Keeping model code and credentials in the same environment creates a fuzzy security boundary. Prompt‑injection can coax Claude to read environment variables and exfiltrate secrets. The article proposes structural fixes:

Git scenario: write the repo access token to the remote during sandbox init; subsequent pushes/pulls bypass the Agent.

MCP tool: store OAuth tokens in an external vault; Claude accesses them via a proxy that the Harness never sees.

Both approaches share the principle of not relying on “the model can’t do X”.

Session ≠ Context Window

Long‑running tasks inevitably exceed Claude’s context window. Traditional mitigations (compaction, external memory tools, context trimming) risk discarding information needed later. Managed Agents keep Session outside the context window, allowing arbitrary getEvents() slices to replay or rewind history, ensuring needed context is always reachable.

Many Brains, Many Hands

Decoupling enables horizontal scaling:

Many Brains

Separate harnesses let customers connect their own sandbox to Anthropic’s brain, eliminating the need for network peering and unlocking VPC use cases.

Many Hands

Each hand is a tool ( execute) independent of any brain. Hands pass state via the Session log, allowing one brain to hand off work to another. This design yields a ~60% reduction in p50 TTFT and >90% reduction in p95 TTFT, because worst‑case container‑initialization paths are avoided.

Meta‑Harness

The vision is a “harness of harnesses” with generic interfaces that can accommodate task‑specific harnesses or broad‑purpose ones like Claude Code. This mirrors the longevity of OS system calls ( read() / write()) that were designed without foreseeing cloud computing or AI.

Takeaways

1. Each added Harness component creates a dependency on current model capabilities; regular ablation studies should verify necessity.

2. Security must be built on structural isolation, not on assumptions about model limitations.

3. Moving state beyond component boundaries (Session outside the context window) is essential for robustness and extensibility.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Scalability Security Virtualization Agent architecture Anthropic Managed Agents

Written by

Shi's AI Notebook

AI technology observer documenting AI evolution and industry news, sharing development practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.