How to Build a Highly Available Redis Service with Sentinel and Virtual IP
This article walks through the design of a highly available Redis deployment, explains common failure scenarios, compares single‑node, master‑slave, and multi‑Sentinel architectures, and shows how adding a virtual IP and three Sentinel instances can provide robust HA while keeping client usage simple.
Possible Exceptions
High‑availability (HA) for Redis means the service continues to operate or recovers quickly after a failure. Three low‑probability failure scenarios are considered:
Exception 1: The Redis process on a node crashes (e.g., killed manually).
Exception 2: An entire node becomes unavailable (power loss, hardware fault).
Exception 3: Network communication between two nodes is broken (cable cut, routing failure).
The HA goal is to tolerate any single‑point failure.
Reference Solutions
Common Redis HA approaches include Keepalived, Codis, Twemproxy, and the official Redis Sentinel. For a small‑scale service the official Sentinel solution is chosen because it requires fewer machines than a full cluster.
Solution 1 – Single‑node Redis (no Sentinel)
A single Redis instance is simple to deploy but has a single point of failure: if the process or the host crashes, the service becomes unavailable and all in‑memory data is lost unless persistence is configured.
Solution 2 – Master/Slave with a Single Sentinel
Two Redis instances (master + slave) run on separate servers. One Sentinel monitors both instances and promotes the slave when the master fails. The client queries the Sentinel to discover the current master.
Drawback: the Sentinel itself is a single point of failure. If the Sentinel process stops, clients cannot obtain master information.
Solution 3 – Master/Slave with Two Sentinels
Adding a second Sentinel removes the Sentinel single‑point failure. However, Redis Sentinel requires a strict majority (> 50 %) of Sentinels to be reachable to perform a failover. With only two Sentinels, a split‑brain (one Sentinel loses contact with the other) results in exactly 50 % connectivity, which is insufficient for automatic promotion. The service remains unavailable.
Additionally, if a network partition occurs (Exception 3), each side may believe it is the master, leading to data divergence.
Solution 4 – Master/Slave with Three Sentinels (final architecture)
Introduce a third server and a third Sentinel, giving three Sentinel processes that manage two Redis instances. Any single node or Sentinel can fail; the remaining two Sentinels still form a majority, allowing automatic master promotion.
Optionally a second slave can be added on the third server for extra redundancy, but each additional slave increases replication lag.
Usability – Providing a Single IP/Port via Virtual IP
Clients normally need a Sentinel‑aware library to discover the master. To keep the simple "IP + Port" model, a virtual IP (VIP) is assigned to the current master. A failover script (triggered by Sentinel) moves the VIP to the new master, so clients always connect to the same address without changing their configuration.
Additional Operational Tips
Configure min-slaves-to-write and min-slaves-max-lag on the master to prevent writes when the required number of slaves is not reachable or lag exceeds a threshold.
Use a process supervisor (e.g., supervisor or systemd) to automatically restart Redis and Sentinel processes after crashes.
If machine resources are limited, the third Sentinel can be run on a client host instead of a dedicated server.
Conclusion
Achieving HA for Redis requires at least three physical servers, three Sentinel processes, and optionally a virtual IP to hide the failover complexity from clients. This design tolerates process crashes, whole‑node failures, and network partitions while keeping client connections simple.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
