Cloud Native 14 min read

How a Custom Redis Operator Transforms Cloud‑Native Deployment at Zhongyuan Bank

This article explains how Zhongyuan Bank built and enhanced a Redis Operator to support Sentinel and Cluster modes, IP pools, cross‑center deployment, and authentication, enabling automated, scalable, and reliable cloud‑native Redis management within their distributed cache platform.

dbaplus Community
dbaplus Community
dbaplus Community
How a Custom Redis Operator Transforms Cloud‑Native Deployment at Zhongyuan Bank

Background

In 2020 Zhongyuan Bank built a distributed cache platform based on Redis. As business demand grew, the VM‑based deployment model could no longer scale efficiently. To achieve cloud‑native operation, the bank adopted a Kubernetes Operator‑based approach for Redis, which can manage stateful workloads declaratively.

Operator Basics

An Operator extends the Kubernetes API with a Custom Resource Definition (CRD) and a controller. The CRD introduces a new resource type that describes the desired state of an application; the controller watches events on that resource and reconciles the actual cluster state to match the specification. This pattern allows complex lifecycle logic (e.g., provisioning, scaling, fail‑over) to be encoded once and reused automatically.

Typical Use Cases

Databases (MySQL, PostgreSQL, etc.)

Caches (Redis, Memcached)

Message queues (Kafka, RabbitMQ)

These services are stateful, require coordinated updates, and benefit from the reliability and portability that an Operator provides.

Limitations of Existing Open‑Source Redis Operators

No support for Redis Sentinel mode.

Cluster mode cannot add new nodes to the service after scaling.

Cannot deploy across multiple data‑center zones.

Simultaneous master‑slave restarts cause brief outages.

Lacks integration with the bank’s authentication centre for multi‑tenant management.

No IP‑pool feature to stabilise node addressing.

In‑House Enhancements

The custom Redis Operator adds the following capabilities:

Full Sentinel support, including automatic fail‑over.

Correct Cluster scaling – newly created pods are joined to the Redis cluster and become service endpoints.

Cross‑zone (multi‑center) deployment to avoid single‑point failures.

Graceful rolling updates that restart master and slave sequentially, preventing downtime.

Integration with the bank’s authentication centre for tenant‑aware access control.

IP‑pool management to keep stable virtual IPs for clients.

Operator Architecture

The operator consists of two core components:

Custom Resource Definition (CRD) – defines a new API object (e.g., RedisCluster or RedisSentinel) that captures the desired topology, replica count, version, and authentication settings.

Controller – watches CRUD events on the CRD, creates or updates a StatefulSet, a Service, and auxiliary resources (ConfigMap, Secret, IP‑pool). It also runs pre‑scale slot migration and post‑scale slot rebalancing logic.

When a CRD instance is created, the controller:

Writes the specification to etcd via the API server.

Generates a StatefulSet that launches the required number of Redis pods.

Monitors pod health and triggers fail‑over for Sentinel or Cluster modes.

Handles updates: if the replica count changes, it first migrates hash slots (Cluster) or adjusts Sentinel quorum, then applies the new StatefulSet.

Deletes all associated Kubernetes objects when the CRD is removed.

Sentinel Cluster Example

A Sentinel deployment is defined in a YAML file similar to the snippet below (the full file is stored in the repository and applied with kubectl apply -f sentinel.yaml):

apiVersion: redis.zhongyuan.com/v1
kind: RedisSentinel
metadata:
  name: my‑sentinel
spec:
  replicas: 3
  version: "6.2"
  authSecret: redis‑auth
  ipPool: sentinel‑pool

Applying the file creates the Sentinel StatefulSet, the associated Service, and registers the cluster with the management console.

Distributed Cache Platform Architecture

The platform is composed of three logical layers:

Cache Management Console – provides real‑time monitoring, diagnostics, hot‑configuration updates, and key‑level statistics. It discovers clusters by reading only the abstract identifiers (namespace, name, type) from the Operator.

Region‑Proxy – abstracts the physical Redis pod IPs. Client requests are routed through the proxy, which forwards them to the appropriate Redis node based on the current topology.

Redis Operator – maintains the Redis topology inside Kubernetes, reacts to CRD events, performs scaling, slot migration, and integrates with the authentication centre.

Because the console only needs the abstract identifiers, adding or removing nodes does not require manual IP updates; the Operator updates the Service endpoints automatically.

Visual Cluster Management

Imported Sentinel or Cluster topologies are rendered in the console as node‑role diagrams. When the desired replica count is changed (e.g., from three to four nodes), the diagram refreshes automatically, reflecting the new topology.

VM‑Based vs. Operator‑Based Deployment

Traditional VM deployment involves a sequence of manual steps:

Upload Redis binaries.

Edit configuration files.

Start each VM.

Run cluster creation scripts.

Configure network policies.

With the Operator, the entire process collapses to editing a single YAML manifest and applying it with kubectl. Deployment time becomes independent of the number of nodes.

Comparison Summary

Manual VM workflow : multiple procedural steps, time proportional to node count, higher risk of configuration drift.

Operator workflow : edit YAML, run kubectl apply, Operator reconciles state, constant deployment time, built‑in scaling and fail‑over.

Benefits and Outcomes

Deployment time reduced from >60 minutes to <15 minutes.

Support for large‑scale clusters (tens of nodes) with automatic node discovery.

Zero‑downtime scaling thanks to pre‑scale slot migration and graceful rolling updates.

Integrated multi‑tenant authentication and stable IP addressing via IP‑pool.

Improved operational efficiency and reliability for the bank’s critical caching services.

Conclusion

The Redis Operator demonstrates how encapsulating deployment and lifecycle logic in a Kubernetes Operator can replace cumbersome VM‑based processes. By extending the API with CRDs and handling stateful semantics (Sentinel fail‑over, Cluster slot rebalancing, cross‑zone deployment), the Operator provides a cloud‑native, declarative, and automated solution for managing Redis at scale.

Operator workflow diagram
Operator workflow diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeAutomationKubernetesmiddlewareOperator
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.