Cloud Native 18 min read

How Kimi Scaled AI Agents with Alibaba Cloud’s Elastic Sandbox Architecture

Kimi built a high‑performance, low‑cost AI Agent infrastructure by combining Alibaba Cloud ACK node pools and the ACS Agent Sandbox, addressing challenges of instant sandbox response, state continuity, massive concurrency, cost efficiency, security isolation, and search‑memory integration for production‑grade agents.

Alibaba Cloud Infrastructure

Jan 26, 2026

How Kimi Scaled AI Agents with Alibaba Cloud’s Elastic Sandbox Architecture

Kimi previously launched several AI Agent capabilities such as Deep Research, Agentic PPT, OK Computer, and Data Analysis, which required handling tens of thousands of concurrent user requests and massive, isolated compute resources during both online service and model training phases.

Key Challenges

Challenge 1: Instant sandbox response – Agents must start within seconds, provide strong isolation for unverified code, and avoid the multi‑minute startup times of traditional VMs or containers.

Challenge 2: State continuity and scheduling pressure – Long‑running agents need pause/resume capabilities, and the system must schedule hundreds of thousands of pods without resource contention.

Challenge 3: Cost‑effective massive concurrency – Provisioning resources for peak loads leads to waste; elastic, on‑demand scheduling is required to keep costs low.

Solution Architecture

Kimi partnered with Alibaba Cloud, using Alibaba Cloud Container Service for Kubernetes (ACK) node pools and the ACS Agent Sandbox as the core of an end‑to‑end Agent Infra platform.

ACK node pools provide instant elasticity across multiple AZs and instance types, with custom images and data‑disk snapshots that cut node‑initialization time by over 60%.

ENI pre‑allocation via the Terway network plugin eliminates network‑ready delays, enabling rapid pod startup.

ACS Agent Sandbox Features

Built on lightweight MicroVM technology, reducing virtualization overhead by ~90% and achieving second‑level sandbox startup.

Resource pre‑scheduling and image‑cache snapshots accelerate instance creation; burst quota allows temporary CPU/Memory scaling during startup, cutting Python sandbox launch time by >60%.

State‑preserving sleep/wake mechanism stores memory and disk data, enabling instant restoration and cloning for reinforcement‑learning (RL) branch exploration.

Checkpoint cloning creates thousands of identical sandbox instances in seconds, eliminating repeated initialization for Monte‑Carlo Tree Search.

Mixed Compute Scheduling

ACK ResourcePolicy defines a tiered scheduling strategy that reserves a baseline node pool for normal load and overflows excess pods to a Serverless pool (ACS Agent Sandbox) when queue length exceeds thresholds (e.g., 500 pods) or wait time >30 s, balancing cost, elasticity, and stability.

Scheduler and API Server Optimizations

Parameter tuning increases queue depth and per‑pod processing speed, supporting hundreds of pod schedules per second at ten‑thousand‑node scale.

Pod‑affinity caching and parallel dispatch reduce duplicate scheduling overhead.

Control‑plane components (ETCD, API Server, KCM, Scheduler) are deployed across multiple AZs with end‑to‑end parameter optimizations for rapid scaling.

Security Isolation

MicroVM provides hardware‑level isolation for each agent task.

NetworkPolicy enforces namespace and port isolation; Terway enhancements ensure policy scalability.

Per‑agent storage volumes or sub‑directories with ACL/POSIX permissions guarantee data isolation on shared NAS.

Search and Memory Backend

Kimi uses Alibaba Cloud Lindorm, a multi‑model database that integrates wide‑table, search, vector, and AI engines. It offers full‑text + vector RRF‑based dual‑recall, deep compression (30‑50% storage savings), and seamless data flow without custom sync pipelines.

Results

The combined ACK + ACS solution delivers stable, developer‑friendly infrastructure, achieving tens of thousands of sandbox instances per minute, halving startup latency, and dramatically lowering total cost of ownership. It supports production‑grade AI agents with continuous state, fast cloning for RL, robust security, and scalable search/memory services.

ACK node pool elasticity and ACS sandbox coordination

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Kubernetes cost optimization security AI Agent Sandbox elastic scaling

Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.