How a Custom Redis Operator Transforms Cloud‑Native Deployment at Zhongyuan Bank
This article explains how Zhongyuan Bank built and enhanced a Redis Operator to support Sentinel and Cluster modes, IP pools, cross‑center deployment, and authentication, enabling automated, scalable, and reliable cloud‑native Redis management within their distributed cache platform.
Background
In 2020 Zhongyuan Bank built a distributed cache platform based on Redis. As business demand grew, the VM‑based deployment model could no longer scale efficiently. To achieve cloud‑native operation, the bank adopted a Kubernetes Operator‑based approach for Redis, which can manage stateful workloads declaratively.
Operator Basics
An Operator extends the Kubernetes API with a Custom Resource Definition (CRD) and a controller. The CRD introduces a new resource type that describes the desired state of an application; the controller watches events on that resource and reconciles the actual cluster state to match the specification. This pattern allows complex lifecycle logic (e.g., provisioning, scaling, fail‑over) to be encoded once and reused automatically.
Typical Use Cases
Databases (MySQL, PostgreSQL, etc.)
Caches (Redis, Memcached)
Message queues (Kafka, RabbitMQ)
These services are stateful, require coordinated updates, and benefit from the reliability and portability that an Operator provides.
Limitations of Existing Open‑Source Redis Operators
No support for Redis Sentinel mode.
Cluster mode cannot add new nodes to the service after scaling.
Cannot deploy across multiple data‑center zones.
Simultaneous master‑slave restarts cause brief outages.
Lacks integration with the bank’s authentication centre for multi‑tenant management.
No IP‑pool feature to stabilise node addressing.
In‑House Enhancements
The custom Redis Operator adds the following capabilities:
Full Sentinel support, including automatic fail‑over.
Correct Cluster scaling – newly created pods are joined to the Redis cluster and become service endpoints.
Cross‑zone (multi‑center) deployment to avoid single‑point failures.
Graceful rolling updates that restart master and slave sequentially, preventing downtime.
Integration with the bank’s authentication centre for tenant‑aware access control.
IP‑pool management to keep stable virtual IPs for clients.
Operator Architecture
The operator consists of two core components:
Custom Resource Definition (CRD) – defines a new API object (e.g., RedisCluster or RedisSentinel) that captures the desired topology, replica count, version, and authentication settings.
Controller – watches CRUD events on the CRD, creates or updates a StatefulSet, a Service, and auxiliary resources (ConfigMap, Secret, IP‑pool). It also runs pre‑scale slot migration and post‑scale slot rebalancing logic.
When a CRD instance is created, the controller:
Writes the specification to etcd via the API server.
Generates a StatefulSet that launches the required number of Redis pods.
Monitors pod health and triggers fail‑over for Sentinel or Cluster modes.
Handles updates: if the replica count changes, it first migrates hash slots (Cluster) or adjusts Sentinel quorum, then applies the new StatefulSet.
Deletes all associated Kubernetes objects when the CRD is removed.
Sentinel Cluster Example
A Sentinel deployment is defined in a YAML file similar to the snippet below (the full file is stored in the repository and applied with kubectl apply -f sentinel.yaml):
apiVersion: redis.zhongyuan.com/v1
kind: RedisSentinel
metadata:
name: my‑sentinel
spec:
replicas: 3
version: "6.2"
authSecret: redis‑auth
ipPool: sentinel‑poolApplying the file creates the Sentinel StatefulSet, the associated Service, and registers the cluster with the management console.
Distributed Cache Platform Architecture
The platform is composed of three logical layers:
Cache Management Console – provides real‑time monitoring, diagnostics, hot‑configuration updates, and key‑level statistics. It discovers clusters by reading only the abstract identifiers (namespace, name, type) from the Operator.
Region‑Proxy – abstracts the physical Redis pod IPs. Client requests are routed through the proxy, which forwards them to the appropriate Redis node based on the current topology.
Redis Operator – maintains the Redis topology inside Kubernetes, reacts to CRD events, performs scaling, slot migration, and integrates with the authentication centre.
Because the console only needs the abstract identifiers, adding or removing nodes does not require manual IP updates; the Operator updates the Service endpoints automatically.
Visual Cluster Management
Imported Sentinel or Cluster topologies are rendered in the console as node‑role diagrams. When the desired replica count is changed (e.g., from three to four nodes), the diagram refreshes automatically, reflecting the new topology.
VM‑Based vs. Operator‑Based Deployment
Traditional VM deployment involves a sequence of manual steps:
Upload Redis binaries.
Edit configuration files.
Start each VM.
Run cluster creation scripts.
Configure network policies.
With the Operator, the entire process collapses to editing a single YAML manifest and applying it with kubectl. Deployment time becomes independent of the number of nodes.
Comparison Summary
Manual VM workflow : multiple procedural steps, time proportional to node count, higher risk of configuration drift.
Operator workflow : edit YAML, run kubectl apply, Operator reconciles state, constant deployment time, built‑in scaling and fail‑over.
Benefits and Outcomes
Deployment time reduced from >60 minutes to <15 minutes.
Support for large‑scale clusters (tens of nodes) with automatic node discovery.
Zero‑downtime scaling thanks to pre‑scale slot migration and graceful rolling updates.
Integrated multi‑tenant authentication and stable IP addressing via IP‑pool.
Improved operational efficiency and reliability for the bank’s critical caching services.
Conclusion
The Redis Operator demonstrates how encapsulating deployment and lifecycle logic in a Kubernetes Operator can replace cumbersome VM‑based processes. By extending the API with CRDs and handling stateful semantics (Sentinel fail‑over, Cluster slot rebalancing, cross‑zone deployment), the Operator provides a cloud‑native, declarative, and automated solution for managing Redis at scale.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
