How KubeBlocks Enables Scalable, Automated Redis on Kubernetes at Kuaishou
This article details Kuaishou's migration of massive Redis clusters to Kubernetes using the KubeBlocks Operator, covering architecture, multi‑layer management requirements, federated cluster deployment, custom controllers, performance and stability considerations, and the resulting operational benefits.
Background
Kuaishou, a leading short‑video platform, relies heavily on Redis for low‑latency responses. To reduce manual effort and improve cost efficiency, the infrastructure team sought a cloud‑native solution for managing large‑scale Redis clusters on private‑cloud Kubernetes.
Redis Deployment Architecture
Kuaishou uses a horizontally sharded, master‑slave high‑availability Redis setup composed of Server, Sentinel, and Proxy components. The architecture requires flexible shard management, hot‑migration, and isolation.
Core Requirements for a Redis Operator
Layer 1: Multi‑shard and per‑shard replica management – The operator must handle a hierarchy where the first layer manages many shards and the second layer manages replicas within each shard, supporting dynamic scaling.
Layer 2: Data consistency during lifecycle changes – Operations such as shard rebalancing or replica scaling must preserve data integrity.
Layer 3: Topology‑aware service discovery and canary releases – Real‑time topology changes require role detection and labeling to enable dynamic service discovery and staged rollouts.
Existing open‑source Redis operators lacked these capabilities, prompting the development of a custom solution.
KubeBlocks Solution
KubeBlocks, an open‑source Kubernetes database operator, offers extensible APIs via an Addon mechanism that describe Day‑1 initialization and Day‑2 operational behaviors. Kuaishou customized a Redis Addon to match its architecture.
InstanceSet
KubeBlocks introduces InstanceSet, a workload that replaces StatefulSet and tracks each pod’s role (master, replica). It supports custom role definitions, detection methods, and role‑based canary upgrades.
Hierarchical CRDs: Component and Cluster
The Component CRD represents a group of pods (e.g., Proxy, Sentinel, or a shard’s server pods). A special Shard component groups a master and replica pod per shard, allowing dynamic addition or removal of shards.
The Cluster CRD aggregates all components, managing their topology and relationships.
Federated Kubernetes Management for Ultra‑Large Clusters
Kuaishou’s Redis clusters can exceed 10,000 pods, surpassing a single Kubernetes cluster’s capacity. To hide multi‑cluster complexity from applications, the Redis deployment is spread across multiple member clusters managed by a federation control plane.
Federated Cluster Architecture
The federation provides unified scheduling and a unified view, distributing Redis components across member clusters while maintaining global consistency.
Fed‑InstanceSet Controller
The custom Fed-InstanceSet Controller splits a federation‑level InstanceSet into multiple instance sets, assigning them to member clusters based on scheduling policies. It introduces an Ordinals field to ensure globally unique and ordered pod indices across clusters.
Benefits and Risks of Running Stateful Services on Kubernetes
Resource utilization – Consolidated scheduling improves overall resource efficiency and reduces costs.
Operational efficiency – Declarative APIs and operators enable infrastructure‑as‑code management, minimizing manual intervention.
Maintenance cost – Moving from bare‑metal to containers lowers hardware upkeep expenses.
Risks include potential performance degradation (typically <10% overhead), stability concerns due to added abstraction layers, and increased operational complexity requiring expertise in both Redis and Kubernetes.
Mitigation Strategies
Performance testing shows acceptable overhead; workloads should be benchmarked individually.
Admission webhooks (via internal kube-shield) validate configuration changes to prevent accidental disruptions.
Fine‑grained scheduling and resource‑aware load balancing enhance availability.
Conclusion
The migration demonstrates that cloud‑native transformation of stateful services, while challenging, yields significant cost and operational benefits. Kuaishou’s collaboration with the KubeBlocks community resulted in a production‑grade, scalable Redis solution, and the team plans to extend this approach to other databases and middleware.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
