Databases 14 min read

How Volcano Engine veDB Scales to Tens of Thousands of Pods with Cloud‑Native Architecture

This article explains how Volcano Engine's veDB leverages compute‑storage separation, Kubernetes operators, and declarative operations to achieve extreme deployment density, seamless scaling, and high‑availability for millions of database instances, while addressing the challenges of traditional VM‑based deployments.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
How Volcano Engine veDB Scales to Tens of Thousands of Pods with Cloud‑Native Architecture

VeDB, ByteDance's self‑developed cloud‑native distributed database, tackles the scalability, performance, operational complexity, and middleware constraints of traditional single‑node MySQL by adopting a compute‑storage separation architecture and cloud‑native technologies.

Key Benefits

Extreme operational efficiency : A custom K8s Operator reduces high‑risk actions such as restarts, spec changes, and version upgrades to a matter of seconds, with only a single connection interruption perceived by the user.

High deployment density : Deep optimizations of Kubelet, systemd and other components raise the per‑host Pod limit to 800 , with over 300 Pods running in production, dramatically improving resource utilization.

Massive cluster scale : A single Kubernetes namespace can reliably manage tens of thousands of resources (e.g., 50,000 Pods, 50,000 Services), supporting a wide range of internal ByteDance services such as Douyin, e‑commerce, finance, advertising, and more.

The article then explores why cloud‑native is the breakthrough for database scale‑out.

Problems with Early VM‑Based Deployment

Low resource utilization : VM fragmentation prevents fine‑grained scheduling, leaving many physical resources idle.

Bulky scaling : Adding compute nodes or upgrading specs requires new VMs and lengthy redeployment, hindering rapid response to traffic spikes.

High‑risk operations : Version upgrades and parameter changes rely on manual scripts across many machines, increasing inconsistency and failure risk.

Difficult change windows : Coordinating maintenance windows with business teams leads to prolonged downtime.

These issues stem from the lack of a modern infrastructure abstraction layer, which cloud‑native Kubernetes provides.

Core Solution: Containerization + Declarative Operations

VeDB’s evolution focuses on two pillars:

Containerization foundation : Database components (DBEngine, Proxy) are packaged as containers and managed by Kubernetes, delivering:

Operator‑enabled declarative operations : The veDB Operator encodes all operational knowledge (master‑slave switch, backup, upgrade) into code. Users declare the desired state in a YAML file (e.g., target version 5.10.2), and the Operator safely executes the necessary steps.

Addressing Three Scale‑Out Challenges

Challenge 1: Zero‑Impact Upgrades

Upgrading a stateful database normally causes pod restarts and service interruptions. VeDB Operator implements:

Smart upgrade sequence – read‑only nodes are upgraded before read‑write nodes.

"Expand‑then‑shrink" strategy – new version pods are launched in parallel, traffic is switched instantly, and old pods are terminated, reducing upgrade time and avoiding capacity drops.

Additional optimizations such as image pre‑warming, graceful proxy shutdown, and in‑place resource adjustments.

Result: In most scenarios, operational impact on I/O is limited to 5 seconds with only 1 perceived connection interruption.

Challenge 2: High‑Density Deployment & K8s Resource Limits

To maximize utilization, hundreds of database pods must run on a single host, stressing both the host and the Kubernetes control plane.

Resolved exec‑probe overload on systemd by upgrading runc and tuning probe types.

Split etcd clusters by resource type and applied APF (API Priority and Fairness) for fine‑grained throttling, preventing apiserver/etcd overload.

Optimized client‑side access with informers, request parameter tuning, and Kyverno policies to reduce control‑plane load.

Result: Per‑host pod limit increased to 800 ; production hosts run over 300 pods, managing tens of thousands of resources (e.g., 50 k Pods, 50 k Services) reliably.

Challenge 3: Cross‑Cluster & Multi‑Role Instance Management

Complex workloads require different node specifications and sometimes span multiple Kubernetes clusters.

Introduced a custom resource NodeSet in the CRD, allowing grouping of DBEngine nodes with independent specs, versions, and scheduling policies, even across clusters.

The Operator performs operations at the NodeSet level, enabling fine‑grained role management and seamless cross‑cluster orchestration.

Result: Users can configure heterogeneous node groups within a single database instance, achieve resource isolation, and migrate instances across clusters with near‑zero impact.

Future Directions

Serverless for massive instance counts – exploring auto‑scale‑to‑zero and pay‑per‑use models for small‑scale workloads.

AI‑driven automation – aiming to compress control‑plane operations to sub‑second latency for high‑frequency create/delete cycles.

"DB as Git" – investigating version‑controlled database schema and snapshot management via Git, enabling instant branch‑like switches between database states.

VeDB continues to leverage cloud‑native advances to provide a more stable, elastic, and efficient database service.

cloud-nativeOperatordistributed databasesdatabase scalingveDB
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.