Can Redis Thrive on Kubernetes? Insights from Kuaishou’s Cloud‑Native Journey
Drawing on Kuaishou’s experience, this article examines whether stateful services like Redis belong on Kubernetes, outlines the benefits and risks, and details a cloud‑native solution using custom workloads, KubeBlocks, and a federated cluster architecture to achieve scalable, reliable Redis deployments.
Background
Stateful services such as databases and Redis have traditionally been debated for containerization on Kubernetes (K8s). Kuaishou’s infrastructure team migrated stateless workloads to K8s and then tackled the challenges of scaling Redis, which operates at a scale far beyond a single K8s cluster.
Are Stateful Services Suitable for Kubernetes?
Running stateful services on K8s provides clear benefits but also introduces specific risks.
Benefits
Resource utilization : Pooling, unified scheduling, and mixed‑node deployment improve efficiency and lower cost.
Operational efficiency : Declarative APIs and controller models simplify maintenance.
Cost reduction : Consolidated infrastructure reduces OPEX.
Risks
Performance degradation : Container abstraction can add latency.
Stability impact : Underlying database reliability may be affected.
Operational complexity : Incident resolution may require expertise in both the database and cloud‑native stack.
Stateful Service Cloud‑Native Considerations
Two categories of state must be managed:
Data state : Unique data held by each instance; must be backed up, restored, and rebalanced during lifecycle changes.
Topology state : Dynamic relationships (roles, connections) between instances that evolve at runtime.
Ensuring data availability, lifecycle management, and dynamic topology handling are the core challenges.
Kuaishou’s Redis Cloud‑Native Architecture
Kuaishou uses a classic master‑slave Redis design composed of Server, Sentinel, and Proxy components. The overall scale exceeds a single K8s cluster, requiring a federated‑cluster deployment.
Custom Workloads vs. Native StatefulSet
Kubernetes StatefulSet supplies stable network and storage identifiers but cannot fully express dynamic topology. Kuaishou therefore built custom workloads combined with Operators (Redis Operator, MySQL Operator). Developing an Operator from scratch is costly, and integrating custom logic with existing operators is complex.
KubeBlocks Solution
KubeBlocks is an open‑source K8s Operator that abstracts database management through a unified API, aiming to “run any database on Kubernetes.” It introduces four workload concepts:
InstanceSet : Extends StatefulSet with role definition, detection, and update strategies; dynamically labels instances for role‑based service discovery.
Component : Decouples component definitions from their instances, enabling flexible instance creation and lifecycle handling.
Shard : Generates a set of identical Component instances, ideal for sharded services such as Redis.
Cluster : Represents the entire stateful service cluster, unifying Proxy, Server, and Sentinel topology.
Federated Cluster Architecture
Because a single Redis cluster exceeds the capacity of one K8s cluster, Kuaishou adopts a federated‑cluster model that provides:
Unified scheduling : A federation entry point distributes InstanceSets to member clusters based on scheduling recommendations.
Unified view : Resources from both federation and member clusters are accessed through a common API.
The Fed‑InstanceSet controller splits InstanceSets across clusters, assigns global ordinal indices to maintain ordering, and builds a directed acyclic graph (DAG) to control cross‑cluster change propagation, guaranteeing consistent and ordered updates.
Risk Mitigation
Performance
Internal benchmarks show the cloud‑native Redis deployment adds less than 10 % latency, which is generally acceptable for most workloads.
Stability
Automated K8s operations can obscure failure origins. Kuaishou distinguishes expected from unexpected changes by:
Using ServiceAccount‑based identity checks to verify the initiator of a change.
Deploying an Admission Webhook‑driven “kube‑shield” system that blocks risky operations before they reach the cluster.
Operational Complexity
Migration requires deep knowledge of both Redis and container platforms. Kuaishou mitigates this by clearly separating responsibilities:
The Redis team defines Cluster objects, lifecycle hooks, and topology rules.
The container‑cloud team develops and maintains the KubeBlocks operators, federation logic, and scheduling policies.
Implementation Steps Summary
Model Redis topology using KubeBlocks Component (Server, Sentinel, Proxy) and Shard for each shard.
Wrap each shard in an InstanceSet to provide role detection and dynamic labeling.
Define a top‑level Cluster CRD that aggregates all shards and proxies.
Deploy the Fed‑InstanceSet controller in the federation cluster; place InstanceSet controllers in each member cluster.
Configure Ordinals in the Fed‑InstanceSet spec to guarantee globally unique, ordered instance indices across clusters.
Construct a DAG of change actions (scale‑out, role transition, rolling update) to enforce ordered, concurrent‑controlled updates.
Enable Admission Webhook “kube‑shield” to reject unauthorized modifications based on ServiceAccount identity.
Conclusion
Stateful service cloud‑nativeization demands a careful risk‑benefit analysis. Kuaishou’s Redis case demonstrates a low‑cost, scalable path using KubeBlocks’ custom workloads and a federated‑cluster architecture. The approach provides acceptable performance overhead, maintains stability through identity‑based admission control, and reduces operational complexity by clearly separating domain responsibilities. The experience will guide future cloud‑native migrations of other databases and middleware.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
