How ByteDance Scaled Stateful Applications with Cloud‑Native Kubernetes
This article details ByteDance's journey of migrating stateful services to a cloud‑native Kubernetes platform, covering challenges in state management, infrastructure enhancements, storage solutions, monitoring, and automated operations that together improve efficiency and reduce costs at massive scale.
Background
Stateful applications retain data and often require sharding, replication, and persistence. ByteDance migrated many such services to a cloud‑native environment built on Kubernetes.
Stateful Application Scenarios
Typical use cases include search recall (large models with long load times), push services (each instance handles a shard of users and needs a unique ID), and storage services such as self‑developed KV, Druid, and Elasticsearch, which combine local storage dependence with instance relationships.
Challenges and Benefits of Cloud‑Native Migration
Before migration, services ran on physical machines, leading to complex architecture, inflexible operations, inconsistent environments, and resource fragmentation. Cloud‑native adoption aimed to improve efficiency and reduce cost.
Efficiency was achieved through standardized infrastructure APIs, business‑framework abstraction, automated processes, and unified delivery via containers or images.
Cost reductions came from faster container start‑up, on‑demand resource allocation, and streamlined application iteration.
State Management
State management for stateful apps is divided into version management, data management, and service discovery & routing.
Version management resembles Kubernetes Deployment/StatefulSet capabilities, handling upgrades and rollbacks.
Data management updates external data without changing the number of service replicas.
Service discovery routes requests to the appropriate shard instance.
ByteDance introduced the SolarService abstraction, combining an enhanced StatefulSet (StatefulSet Extension) with a Budset CRD for data versioning. A sidecar container synchronizes data according to Bud definitions.
Rolling Upgrade Example
Shards are upgraded in parallel; within each shard, the MaxUnavailable setting controls how many replicas can be updated concurrently.
Scaling Example
Scaling can increase replica count for a shard (simple) or expand the number of data shards (requires a two‑step process: enlarge StatefulSet, then split Budset data and update service discovery).
Service Discovery & Routing
A custom Proxy component distributes requests to the appropriate StatefulSet Extension pods. Additional routing logic uses per‑pod error rates to implement circuit‑breaking. Service discovery stores ShardID, ReplicaID, and total shard count in a KV store for higher‑level frameworks.
Infrastructure Enhancements
Scheduling : ByteDance extended the Kubernetes scheduler and Kubelet to be NUMA‑aware, adding custom predicates and priorities, and a CPU manager policy that binds pods to specific CPU sets and NUMA nodes.
Storage : Implemented dynamic provisioning for remote block storage (NBD‑based CSI) and local disk storage (LPV). Supported multiple storage media including tmpfs, LVM, full‑disk isolation, and Intel AEP non‑volatile memory, with topology‑aware allocation via an extended Topology Manager policy.
Monitoring & Automated Operations
Developed SysProbe, an eBPF‑based container‑level metrics collector, feeding over 100 metrics into a high‑availability Metrics Aggregation Server (MAS) for dashboards.
Extended Pod Disruption Budgets (PDB) via webhook to customize eviction strategies, enabling coordinated multi‑AZ pod eviction while respecting replica distribution.
CSI Race Conditions
Identified and mitigated race conditions in CSI volume unpublish/unstage sequences and residual mount points, adding cleanup logic to the Kubelet volume manager.
Conclusion
ByteDance’s cloud‑native transformation of stateful services delivered efficient, cost‑effective operations, enhanced performance through NUMA‑aware scheduling, richer storage capabilities, and higher automation levels, while supporting thousands of services across tens of thousands of nodes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
