Cloud Native 18 min read

ByteDance Stateful Application Cloud‑Native Practices

ByteDance’s cloud‑native migration of stateful services uses a custom SolarService extending StatefulSet with Budset CRD to handle versioned data, shard‑aware routing, NUMA‑aware scheduling, advanced storage, eBPF monitoring, and automated PDB eviction, delivering efficiency, cost savings, and reliable rolling upgrades.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
ByteDance Stateful Application Cloud‑Native Practices

Background

The talk introduces the challenges and solutions ByteDance faced when migrating stateful applications to a cloud‑native environment. It contrasts stateless services, which fit naturally with Kubernetes objects like Deployments, with stateful services that require data persistence, sharding, and unique instance identifiers.

Characteristics of Stateful Applications

Stateful apps depend on local data, must preserve data across upgrades, and often have master‑slave or primary‑replica relationships. They can be data‑stateful or network‑stateful.

Business Scenarios at ByteDance

Typical use cases include search recall (large models with long load times), push services (per‑shard user targeting requiring unique IDs), and storage services such as custom KV stores, Druid, and Elasticsearch, which combine data locality and replica relationships.

Benefits of Cloud‑Native Migration

Efficiency gains come from standardized infrastructure APIs, abstracted business frameworks, automated processes, and unified delivery via containers. Cost reductions stem from faster container start‑up and on‑demand resource allocation.

Challenges and Solutions

Key challenges involve state management, enhanced base capabilities, and automated operations. ByteDance built a custom SolarService that extends StatefulSet with a Budset CRD to manage data versioning and side‑car data sync.

State Management

Three aspects are addressed: version management (similar to Deployment/StatefulSet upgrades), data management (updating external data without changing replicas), and service discovery & routing (directing requests to the correct shard). The solution includes a matrix of Pods per shard, a custom controller for rolling upgrades, and a proxy layer for routing based on shard and replica health.

Rolling Upgrade Example

Shards are upgraded in parallel, respecting a configurable MaxUnavailable per shard. Images illustrate the process.

Scaling

Scaling a shard’s replica count follows standard StatefulSet scaling. Scaling data shards involves a multi‑step process that doubles the number of shards, updates Budsets, and gradually shifts traffic to the new shards.

Service Discovery & Routing

A custom Service Discovery component registers pod IPs, ports, shard IDs, and replica IDs in a KV store, enabling fine‑grained routing and circuit‑breaking without relying on native K8s Service routing.

Base Capability Enhancements

Two main areas: scheduling and storage. Scheduling leverages NUMA‑aware enhancements to the K8s scheduler and Kubelet, exposing micro‑topology resources via CRDs and custom predicates/priorities, and assigning CPU sets and NUMA nodes to pods.

Storage enhancements include dynamic provisioning for multiple media, remote block storage via NBD (single‑write‑single‑read and multi‑read modes), and local disk solutions (tmpfs, LVM, full‑disk allocation, and Intel AEP). The system also implements Volume Scheduling with custom predicates, assume‑volume annotations, and bind phases.

Monitoring & Automated Operations

ByteDance developed an eBPF‑based container‑level monitoring component SysProbe that collects over 100 metrics, aggregated by a high‑availability Metrics Aggregation Server (MAS) and exported to downstream sinks.

For automation, a custom PDB extension via webhook adds eviction strategies that consider multi‑AZ pod distribution, ensuring safe pod removal during host maintenance.

CSI Race Conditions and Mitigations

Issues such as residual global mounts, duplicate volume opens, and race conditions during pod deletion were addressed by adding residual mount scans in the Kubelet Volume Manager and enhancing CSI drivers to handle unstage failures.

Case Study

Exploration of lightweight virtualization with Kata to contain failures at the pod level rather than the host.

Conclusion

Stateful applications in ByteDance’s cloud‑native journey exhibit local data dependency, persistence, and unique instance identification. Cloud‑native transformation yields efficiency and cost benefits through improved state management, extreme performance via NUMA‑aware scheduling, enriched storage capabilities, and automated operations with custom PDB extensions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringcloud-nativeAutomationKubernetesSchedulingstoragestateful applications
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.