Containerizing Stateful Services on Kubernetes: Challenges, Solutions, and Best Practices
This article examines the difficulties of running stateful services such as Redis, etcd, and MySQL on Kubernetes and presents practical solutions—including workload selection, CRD/operator extensions, scheduling strategies, high‑availability mechanisms, performance‑optimized networking and storage, and chaos‑engineering validation—to achieve reliable, high‑performance containerized deployments.
Background
As Kubernetes becomes the dominant cloud‑native platform, many traditional services are being migrated from VMs or physical machines to Kubernetes. While most Kubernetes deployments are stateless, stateful services pose additional challenges such as resource isolation, fine‑grained access control, and performance requirements. This article uses a Redis cluster (Codis) as a concrete example to explore common pain points and solutions for containerizing stateful workloads.
Stateful Service Containerization Challenges
The article outlines the key questions to address when containerizing stateful services: describing the service in Kubernetes terms, choosing the appropriate workload, extending Kubernetes when built‑in workloads are insufficient, performing safe updates, ensuring pods are scheduled across failure domains, handling pod failures, and meeting high network and storage performance demands.
Workload Types
Kubernetes provides several built‑in workloads such as Pod , Deployment , and StatefulSet . Pods are the smallest scheduling unit and can host sidecar containers for auxiliary tasks. Deployments are suited for stateless components but lack stable identities and ordered updates, making them unsuitable for stateful services. StatefulSet offers stable network identities, persistent storage, and ordered rolling updates, which are essential for services like etcd, Zookeeper, and Redis.
Extension Mechanisms
Kubernetes offers a rich extension ecosystem, including CRD , Aggregated API Server , custom schedulers, and operators. When built‑in workloads cannot meet specific requirements, developers can define custom resources (CRDs) and implement controllers/operators to manage the lifecycle of complex stateful applications.
Enhanced Workloads
Enhanced workloads such as Tencent's StatefulSetPlus , tkestack TAPP , and the open‑source OpenKruise (which provides CloneSet, Advanced StatefulSet, SideCarSet, etc.) add features like in‑place updates, fixed IPs, HPA support, and more granular rollout control.
Operator‑Based Extension
By defining a CRD that represents a complete Codis cluster, an operator can watch for create/update/delete events and reconcile the desired state by creating the necessary Deployments, StatefulSets, and other components. The operator follows the typical controller pattern: List → Watch → Queue → Reconcile.
Scheduling
Ensuring that equivalent pods (e.g., master‑replica pairs) are spread across failure domains is achieved using Kubernetes affinity and anti‑affinity rules. The article provides an example anti‑affinity configuration for an etcd cluster:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: etcd_cluster
operator: In
values: ["etcd-test"]
topologyKey: failure-domain.beta.kubernetes.io/zoneWhen built‑in scheduling is insufficient, custom predicates, priorities, or entirely separate schedulers can be employed.
High Availability
Stateful services require robust HA mechanisms. The article discusses three replication models: master‑slave replication (synchronous, asynchronous, semi‑synchronous), decentralized replication (quorum reads/writes), and consensus algorithms such as Raft/Paxos. It emphasizes that container‑level HA (e.g., pod self‑healing) must be complemented by service‑level HA logic to avoid data loss or split‑brain scenarios.
Performance
High performance for stateful services depends on both networking and storage. Kubernetes supports extensible CNI plugins (underlay vs. overlay) and provides examples like Flannel (UDP/VXLAN/host‑gw) and Tencent's TKE network modes (global route, VPC‑CNI, pod‑exclusive NIC). For storage, the PV/PVC model, StorageClass, and CSI plugins enable flexible provisioning of local disks, cloud disks, and network file systems. Code examples for PVC, PV, and StorageClass definitions are included.
apiVersion: v1
kind: PersistentVolumeClaim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: cbs apiVersion: v1
kind: PersistentVolume
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 100Gi
persistentVolumeReclaimPolicy: Delete
qcloudCbs:
cbsDiskId: disk-r1z73a3x
storageClassName: cbs
volumeMode: Filesystem apiVersion: storage.k8s.io/v1
kind: StorageClass
parameters:
type: cbs
provisioner: cloud.tencent.com/qcloud-cbs
reclaimPolicy: Delete
volumeBindingMode: ImmediateChaos Engineering
To validate the stability of containerized stateful services, the article recommends using chaos‑engineering tools such as Chaos Mesh to inject pod, network, and I/O failures, helping uncover bugs in operators and underlying Kubernetes components.
Conclusion
The article summarizes that successful containerization of stateful services requires careful workload selection, extension via CRDs/operators, advanced scheduling, robust HA mechanisms, performance‑optimized networking and storage, and thorough chaos‑engineering testing to ensure reliability and competitiveness.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.