New Oriental’s Blueprint for Stateful Services in Kubernetes: Custom Operators & XLSS
This article details New Oriental's approach to building stateful services on Kubernetes, covering the challenges of native storage, the use of custom Operators, the design of the XLSS local storage solution, backup and disaster‑recovery workflows, and a multi‑phase roadmap for large‑scale stateful middleware deployment.
New Oriental’s Stateful Services in K8s – Current Situation
Stateful service construction has always been a challenging task in Kubernetes. New Oriental addresses this by combining a custom Operator with a self‑developed local storage service, enhancing the native local storage capabilities and steadily advancing enterprise containerization.
The diagram shows that the upper layer Pods are managed by a custom Operator and the StatefulSet controller, Pods bind to PVCs, PVCs bind to PVs, and the lowest layer is the storage service.
Native K8s Support for Stateful Services
Kubernetes natively supports stateful services through the combination of the StatefulSet controller and a storage service.
StatefulSet Controller
The StatefulSet controller manages workloads for stateful applications, handling deployment, scaling, and providing persistent storage and stable identifiers for Pods.
Features of StatefulSet
Stable, unique network identity
Stable, persistent storage
Ordered, graceful deployment and scaling
Ordered, automatic rolling updates
Limitations of StatefulSet
The controller does not provide storage provisioning.
During deletion or scaling down, only Pods are handled.
Manual creation of a headless service is required for unique Pod names.
Graceful deletion requires scaling to zero first.
Ordered deployment creates dependency issues; later Pods wait for earlier Pods to start.
These limitations mean the controller manages Pods and part of the storage service (e.g., PVC creation during scaling) but cannot handle other storage aspects, and ordered deployment can cause negative dependencies that need manual intervention.
Storage Service
The storage service includes both remote and local storage. For general needs, remote storage is preferred; for high‑performance I/O, local storage is required. Currently, local storage comprises two solutions: the native Kubernetes local storage and the self‑developed XLSS storage service.
Cloud Native Storage (CNS) Overview
The CNCF snapshot from July 2021 lists over 50 storage products, roughly half commercial and the rest open‑source, covering file system, object, and block storage types.
Kubernetes PV Types
Common native PV types include rbd, hostPath, local, etc.
How to Choose Storage
Choosing a storage product involves several factors:
Open‑source vs. commercial
Local vs. remote
Dynamic provisioning vs. static provisioning
Data high‑availability solutions
Open‑source solutions are cost‑free but may lack stability and features; commercial products offer guaranteed stability and capabilities at a price. The final decision depends on specific requirements.
Self‑Developed Storage XLSS
Key requirements for New Oriental’s stateful services include high performance for I/O‑intensive workloads, data availability with some disaster‑recovery capability, and dynamic provisioning for full automation.
XLSS (XDF Local Storage Service) is a high‑performance, highly‑available local storage solution that addresses shortcomings of native local storage, such as static provisioning, node‑affinity constraints, and risk of data loss.
XLSS Core Components
xlss-scheduler : A custom scheduler based on kube‑scheduler that automatically recognizes XLSS local PV usage, intervenes in Pod scheduling, and removes node‑affinity constraints.
xlss-rescuer : Runs as a DaemonSet, executes backup jobs according to policies, monitors recovery requests, performs data recovery, and provides metrics.
xlss-localpv-provisioner : Dynamically creates local storage by launching a helper Pod on the target node to create the directory, then deletes the helper after PV creation.
xlss‑scheduler Logic
The scheduler enhances three extension points in the Kubernetes scheduling framework:
Prefilter : Analyzes node health based on Pod affinity; marks unhealthy nodes.
Filter : Removes node affinity for Pods with special marks.
Prebind : Clears special marks after scheduling and triggers data‑recovery requests.
Data Backup Job Logic
Watch Pod events.
Extract backup policy from Pod annotations.
Sync backup policy to a cache queue.
The right‑hand loop sorts queued items by next execution time, sleeps until the next backup is due, and then executes the backup job.
Data Recovery Job Logic
Watch CRD for recovery requests emitted by xlss‑scheduler.
Analyze CRD status to avoid duplicate handling.
Sync recovery request to the cache queue.
The execution loop updates CRD status, restores snapshot data to the target directory, updates PV and PVC, and finally deletes the CRD instance.
Local PV Provisioner Logic
When a storage creation request arrives, the provisioner pod creates a temporary helper pod scheduled to the target node, which creates the directory for local storage. After the PV backend is created, the helper pod is removed, completing dynamic local storage creation.
Automatic Disaster Recovery Workflow
Data backup: backup PV data at Pod granularity.
Node anomaly: a node failure leaves Pods in Terminating state.
Abnormal Pod handling: clean up or let tools recreate Pods.
Intelligent scheduling: remove affinity and schedule Pods to healthy nodes.
Data recovery: restore the latest snapshot to the Pod.
Service recovery: restart the application to provide service.
The workflow completes and returns to the starting point.
Large‑Scale Stateful Middleware Services
With storage issues largely solved, New Oriental builds storage‑type middleware services. For example, a Kafka cluster is deployed via a custom Kafka Operator that specifies XLSS as the storage backend, illustrating the Operator + XLSS pattern for stateful middleware.
Deployment of XLSS on KubeSphere
Initial deployment requires planning local disks for XLSS, then deploying XLSS components to the Kubernetes cluster using KubeSphere CI/CD pipelines. The pipeline consists of five automated steps that transform XLSS source code into running containers.
Road Map
New Oriental’s stateful service containerization is divided into four phases:
Phase 1 – “Pre‑cloud” : VM + PaaS management, basic resource management and simple scheduling.
Phase 2 – “Early Cloud” : Migration to Kubernetes with Operators, increasing automation but exposing storage performance limits.
Phase 3 – “Self‑development” : Development of XLSS, achieving local storage with dynamic provisioning and data availability, though still facing large‑data recovery latency and PV isolation issues.
Phase 4 – “Excellence” : Isolation + Physical Backup using LVM for storage isolation and DRBD for synchronous physical backup, addressing previous shortcomings.
-FIN-
Qingyun Technology Community
Official account of the Qingyun Technology Community, focusing on tech innovation, supporting developers, and sharing knowledge. Born to Learn and Share!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
