Cloud Native 15 min read

How KubeBlocks Enables Scalable, Automated Redis on Kubernetes at Kuaishou

This article details Kuaishou's migration of massive Redis clusters to Kubernetes using the KubeBlocks Operator, covering architecture, multi‑layer management requirements, federated cluster deployment, custom controllers, performance and stability considerations, and the resulting operational benefits.

ITPUB
ITPUB
ITPUB
How KubeBlocks Enables Scalable, Automated Redis on Kubernetes at Kuaishou

Background

Kuaishou, a leading short‑video platform, relies heavily on Redis for low‑latency responses. To reduce manual effort and improve cost efficiency, the infrastructure team sought a cloud‑native solution for managing large‑scale Redis clusters on private‑cloud Kubernetes.

Redis Deployment Architecture

Kuaishou uses a horizontally sharded, master‑slave high‑availability Redis setup composed of Server, Sentinel, and Proxy components. The architecture requires flexible shard management, hot‑migration, and isolation.

Redis deployment architecture
Redis deployment architecture

Core Requirements for a Redis Operator

Layer 1: Multi‑shard and per‑shard replica management – The operator must handle a hierarchy where the first layer manages many shards and the second layer manages replicas within each shard, supporting dynamic scaling.

Layer 2: Data consistency during lifecycle changes – Operations such as shard rebalancing or replica scaling must preserve data integrity.

Layer 3: Topology‑aware service discovery and canary releases – Real‑time topology changes require role detection and labeling to enable dynamic service discovery and staged rollouts.

Existing open‑source Redis operators lacked these capabilities, prompting the development of a custom solution.

KubeBlocks Solution

KubeBlocks, an open‑source Kubernetes database operator, offers extensible APIs via an Addon mechanism that describe Day‑1 initialization and Day‑2 operational behaviors. Kuaishou customized a Redis Addon to match its architecture.

InstanceSet

KubeBlocks introduces InstanceSet, a workload that replaces StatefulSet and tracks each pod’s role (master, replica). It supports custom role definitions, detection methods, and role‑based canary upgrades.

Hierarchical CRDs: Component and Cluster

The Component CRD represents a group of pods (e.g., Proxy, Sentinel, or a shard’s server pods). A special Shard component groups a master and replica pod per shard, allowing dynamic addition or removal of shards.

The Cluster CRD aggregates all components, managing their topology and relationships.

Hierarchical CRD design
Hierarchical CRD design

Federated Kubernetes Management for Ultra‑Large Clusters

Kuaishou’s Redis clusters can exceed 10,000 pods, surpassing a single Kubernetes cluster’s capacity. To hide multi‑cluster complexity from applications, the Redis deployment is spread across multiple member clusters managed by a federation control plane.

Federated Cluster Architecture

The federation provides unified scheduling and a unified view, distributing Redis components across member clusters while maintaining global consistency.

Federated cluster architecture
Federated cluster architecture

Fed‑InstanceSet Controller

The custom Fed-InstanceSet Controller splits a federation‑level InstanceSet into multiple instance sets, assigning them to member clusters based on scheduling policies. It introduces an Ordinals field to ensure globally unique and ordered pod indices across clusters.

Fed‑InstanceSet controller architecture
Fed‑InstanceSet controller architecture

Benefits and Risks of Running Stateful Services on Kubernetes

Resource utilization – Consolidated scheduling improves overall resource efficiency and reduces costs.

Operational efficiency – Declarative APIs and operators enable infrastructure‑as‑code management, minimizing manual intervention.

Maintenance cost – Moving from bare‑metal to containers lowers hardware upkeep expenses.

Risks include potential performance degradation (typically <10% overhead), stability concerns due to added abstraction layers, and increased operational complexity requiring expertise in both Redis and Kubernetes.

Mitigation Strategies

Performance testing shows acceptable overhead; workloads should be benchmarked individually.

Admission webhooks (via internal kube-shield) validate configuration changes to prevent accidental disruptions.

Fine‑grained scheduling and resource‑aware load balancing enhance availability.

Conclusion

The migration demonstrates that cloud‑native transformation of stateful services, while challenging, yields significant cost and operational benefits. Kuaishou’s collaboration with the KubeBlocks community resulted in a production‑grade, scalable Redis solution, and the team plans to extend this approach to other databases and middleware.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesOperatorredisstateful servicesKubeBlocks
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.