Artificial Intelligence 11 min read

How We Scaled Millions of AI Agents with Unikraft Micro‑VMs and a Control‑Plane Sandbox

Browser Use evolved from AWS Lambda to a control‑plane‑driven architecture using Unikraft micro‑VMs, isolating each AI web agent in a sandbox that only receives three environment variables, enabling secure, scalable execution of millions of agents with zero‑trust isolation and fast start‑up times.

High Availability Architecture

Feb 27, 2026

How We Scaled Millions of AI Agents with Unikraft Micro‑VMs and a Control‑Plane Sandbox

Evolution from AWS Lambda to Control‑Plane‑Based Unikraft Micro‑VMs

Browser Use initially ran millions of web agents on AWS Lambda, which offered per‑invocation isolation and instant scaling. Later a code‑execution feature was added, allowing agents to run Python, execute shell commands, and manipulate files inside a sandbox perceived as a tool.

The main problem was that the agent loop and the REST API shared the same backend process, causing crashes on redeploy, memory pressure, and latency when the two workloads competed for resources.

Two Isolation Modes

Mode 1 – Isolate the Tool : The agent runs on the host infrastructure while dangerous operations (code execution, terminal access) are performed in a separate sandbox accessed via HTTP. The host never runs untrusted code directly.

Mode 2 – Isolate the Agent : The entire agent runs inside a sandbox that holds no secrets. Communication with the outside world happens through a control plane that possesses all credentials.

We started with Mode 1 and later switched to Mode 2.

Sandbox Environment

The same container image can run either as a Unikraft micro‑VM in production or as a Docker container for local development and evaluation. Switching is done with a simple configuration flag: sandbox_mode: 'docker' | 'ukc' In production each agent runs in its own Unikraft micro‑VM that boots in under one second, managed via the Unikraft Cloud REST API on dedicated bare‑metal servers.

The sandbox receives only three environment variables: SESSION_TOKEN, CONTROL_PLANE_URL, and SESSION_ID. No AWS keys, database credentials, or API tokens are exposed.

Unikraft provides built‑in scale‑to‑zero: idle VMs are suspended and instantly resumed when a request arrives, keeping costs near zero while delivering instant wake‑up.

Security Hardening

Byte‑code‑only execution : During Docker build all Python source files are compiled to .pyc and the original .py files are removed. The framework code loads into memory as root and then disappears.

Privilege dropping : The entry point starts as root to read the byte‑code, then immediately drops to a non‑privileged sandbox user using setuid / setgid.

Environment cleanup : After reading SESSION_TOKEN, CONTROL_PLANE_URL, and SESSION_ID from os.environ, the sandbox deletes them from the environment. The VM runs in a private VPC with no external network access, so the tokens are useless outside the control plane.

How the Control Plane Works

The control plane acts as a stateless FastAPI proxy. Sandboxes cannot reach the internet; every request must pass through the control plane, which validates the Bearer: {session_token} header and performs actions using real credentials.

LLM calls are routed through the control plane.

File uploads to S3 are performed via pre‑signed URLs generated by the control plane, the only way a sandbox can interact with external storage.

LLM Proxy

For each LLM invocation the sandbox sends a new message; the control plane stores the full conversation history in a database, reconstructs the context, and forwards it to the provider, keeping the sandbox stateless.

The control plane also enforces cost limits and billing, allowing the sandbox to focus solely on task execution.

File Synchronization via Pre‑Signed URLs

Sandboxes write to a /workspace directory. A sync service watches for changes and periodically uploads files to S3 using pre‑signed URLs supplied by the control plane. The sandbox never sees AWS credentials.

The sandbox detects a file change in /workspace.

It calls POST /presigned-urls with the file path.

The control plane returns a restricted S3 upload URL.

The sandbox uploads the file directly to S3.

Downloads follow the reverse flow, granting the sandbox limited S3 read access without exposing any secrets.

Gateway Protocol

Inside the sandbox, agents communicate with the control plane via a custom "gateway" protocol. The core interface is defined as:

class AgentGateway(Protocol):
    async def invoke_llm(self, new_messages, tools, tool_choice) -> LLMResponse: ...
    async def persist_messages(self, messages) -> None: ...

In production the ControlPlaneGateway sends HTTP requests; in development the DirectGateway calls the LLM directly and keeps history in memory. The agent code sees no difference.

Scalability

The control plane is stateless and can be horizontally scaled behind a load balancer. Adding more agents simply means launching more sandboxes; higher throughput is achieved by adding more control‑plane instances.

Our backend runs on ECS Fargate in private subnets, with the control plane auto‑scaling based on CPU usage. Unikraft handles VM scheduling across multiple metros, ensuring each session gets its own isolated VM.

Conclusion

Sandboxing code‑capable agents can be done either by isolating the tool (Mode 1) or by isolating the entire agent (Mode 2). We adopted Mode 2: a control plane holds all credentials and proxies every operation, while the sandbox receives only three environment variables and runs as a Unikraft micro‑VM in production or a Docker container locally. This adds a network hop and requires three services instead of one, but the latency is negligible compared to LLM response times, and the operational complexity is manageable.

Key insight: An agent should contain nothing worth stealing and no persistent state; all secrets stay in the control plane.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Agents Security sandbox micro-VM Unikraft

Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.