Harrison Chase Explains Two Sandbox Architectures for AI Agents
The article analyzes why AI agents need isolated sandboxes, outlines two architectural patterns—running the agent inside a sandbox or using the sandbox as an external tool—compares their advantages and challenges, and provides concrete implementation examples and community insights.
Two Sandbox Architectures for AI Agents
AI agents increasingly need to execute code, install packages, and access files, which requires isolated workspaces to protect host credentials, files, and network resources. Sandboxes provide this isolation.
Mode 1: Agent Runs Inside the Sandbox
In this pattern the agent is fully contained within a Docker or VM image that includes the agent framework. The external application communicates with the agent via HTTP or WebSocket APIs.
Implementation : Build a pre‑installed image, deploy it in the sandbox, and expose an API endpoint.
Advantages :
The image mirrors the local development environment, allowing the same commands to run locally and in the sandbox.
The agent can directly read and modify the file system.
Useful when the agent is tightly coupled with specific libraries or complex environment state.
Challenges :
Cross‑sandbox communication requires infrastructure (e.g., WebSocket or HTTP layers, session management, error handling). Some providers like E2B handle this in their SDKs.
API keys must reside inside the sandbox, creating a security risk if the sandbox is compromised; providers such as E2B and Runloop are adding key‑vault features.
Updating the agent means rebuilding the container image and redeploying, slowing iteration.
The sandbox must be ready before the agent activates, adding extra logic.
Intellectual‑property leakage is a concern because code and prompts are exposed inside the sandbox.
Witan Labs’ Nuno Campos warns that any part of the agent should not have more privileges than the bash tool, otherwise generated code could gain unrestricted network access.
Mode 2: Sandbox as an External Tool
Here the agent runs locally (or on a server) and calls a remote sandbox via an API whenever code execution is required.
Implementation : The agent generates code, then invokes a sandbox provider’s API (e.g., E2B, Modal, Daytona, Runloop). The provider’s SDK abstracts communication, making the sandbox appear as another tool.
Advantages :
Agent code can be updated instantly without rebuilding images, speeding development.
API keys stay outside the sandbox, reducing exposure.
Clear separation of concerns: agent state (history, memory) lives where the agent runs, while execution happens in the isolated sandbox. Sandbox failures do not lose agent state, and the sandbox backend can be swapped without affecting core logic.
Tomas Beran of E2B adds:
Multiple remote sandboxes can run tasks in parallel.
Billing is per execution rather than per process runtime.
Ben Guo (Zo Computer) notes that Mode 2 is preferable when future GPU‑based agent tools are needed, as persistent sandbox and inference tool requirements diverge.
Challenges :
Network latency is the main drawback; each execution incurs a round‑trip.
Stateful sandbox sessions (preserving variables, files, packages) can mitigate latency by reducing round‑trips.
Choosing Between the Modes
Prefer Mode 1 when :
The agent is tightly coupled with the execution environment (needs continuous access to specific libraries or complex state).
You want production to closely mirror the local development environment.
The provider’s SDK already handles the communication layer.
Prefer Mode 2 when :
Rapid iteration of agent logic during development is required.
You wish to keep API keys outside the sandbox.
A clearer separation between agent state and execution environment is desired.
Implementation Examples
Using the deepagents framework:
Mode 1 – Agent Inside Sandbox
FROM python:3.11
RUN pip install deepagents-cliAfter building the image, additional infrastructure (WebSocket/HTTP server, session management, error handling) is needed to connect the external application with the sandboxed agent.
Mode 2 – Sandbox as a Tool
from daytona import Daytona
from langchain_anthropic import ChatAnthropic
from deepagents import create_deep_agent
from langchain_daytona import DaytonaSandbox
# also possible: E2B, Runloop, Modal
sandbox = Daytona().create()
backend = DaytonaSandbox(sandbox=sandbox)
agent = create_deep_agent(
model=ChatAnthropic(model="claude-sonnet-4-20250514"),
system_prompt="You are a Python coding assistant with sandbox access.",
backend=backend,
)
result = agent.invoke({
"messages": [{
"role": "user",
"content": "Run a small python script",
}]
})
sandbox.stop()Execution flow:
The agent plans locally on your machine.
It generates Python code to solve the problem.
The code is sent to the Daytona API and runs in a remote sandbox.
The sandbox returns the result.
The agent receives the output and continues reasoning locally.
Community Discussion
Developers debated the feasibility of Mode 1 in production, citing security vulnerabilities and infrastructure constraints (observability, uptime, scaling). Nico Ritschel suggested practical mitigations such as proxying inference calls and injecting keys from outside the sandbox. Harrison Chase replied that key‑proxying is not yet standard, though providers are working on it.
Adish Jain (InvariumAI) emphasized that regardless of the mode, verifying what the agent actually does inside the sandbox is the core challenge, highlighting the importance of behavior testing.
Ale Alonso called sandboxing the biggest difficulty when using deepagents. Nathan Flurry introduced a “Sandbox Agent SDK” that abstracts the complexities of Mode 1, supporting multiple agents (Claude Code, Codex, OpenCode, Cursor, Amp, Pi) via a unified HTTP API.
Conclusion
AI agents need isolated environments to execute code safely. Mode 1 (agent inside sandbox) offers a local‑development‑mirrored image and tight coupling with the environment, while Mode 2 (sandbox as a tool) enables fast iteration, keeps API keys external, and separates state from execution. The optimal choice depends on coupling requirements, iteration speed, security preferences, and infrastructure capabilities.
Related links:
https://x.com/hwchase17/status/2021261552222158955?s=46
http://github.com/rivet-dev/sandbox-agent
AI Engineering
Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
