Building a Highly Available Redis Service with Sentinel and Virtual IP
This article explains how to design a highly available Redis service using Sentinel, compares four deployment architectures, discusses failure scenarios and quorum requirements, and shows how a virtual IP can provide a single‑point access while ensuring continuous service.
In‑memory Redis is the most common key‑value database for web applications, used for session storage, caching hot data, simple message queues, and pub/sub systems. Large internet companies often expose Redis as a foundational service.
When providing such a service, callers inevitably ask whether it is highly available—i.e., it continues to serve or recovers quickly after failures.
High availability is defined as the ability to keep serving under various abnormal conditions, or to restore service within a short time. The typical abnormal scenarios are:
Exception 1: A Redis process on a node crashes.
Exception 2: An entire node goes down (power loss, hardware failure).
Exception 3: Network communication between two nodes is broken.
The guiding principle is that the probability of multiple low‑probability events occurring simultaneously is negligible; a system that tolerates a single point of failure can achieve high availability.
Common solutions include Keepalived, Codis, Twemproxy, and Redis Sentinel. For modest data volumes, the official Redis Sentinel is the preferred choice.
Scheme 1: Single‑instance Redis (no Sentinel)
A single Redis server provides no redundancy. If the process crashes or the host shuts down, the service becomes unavailable and any non‑persistent data is lost.
Scheme 2: Master‑Slave with a Single Sentinel
Adding a slave provides data redundancy, and a single Sentinel monitors the master. If the master fails, Sentinel promotes the slave. However, the Sentinel itself is a single point of failure—if it crashes, clients cannot discover the current master.
Scheme 3: Master‑Slave with Two Sentinels
Running two Sentinel instances allows clients to query either one. Yet Redis requires a majority (>50%) of Sentinels to be reachable to perform a failover. If one node (and its Sentinel) goes down, only 50% remain, which is insufficient for automatic master promotion, leaving the service unavailable.
Scenario: Network Partition (Exception 3)
Even if the two Sentinels can communicate, a partition can cause split‑brain: each side may think it is the master, leading to data divergence that cannot be reconciled when the network heals.
Scheme 4: Master‑Slave with Three Sentinels (Recommended)
Introducing a third Sentinel on a third server gives a quorum of three. The system can tolerate any single process failure, single‑machine failure, or a network break between two machines while still providing service.
Optionally, a Redis instance can also be added on the third server, forming a 1‑master + 2‑slave topology for extra redundancy, though more slaves increase synchronization overhead.
Making the Service Easy to Use: Virtual IP (VIP)
Clients prefer a single IP and port, as with a standalone Redis server. By assigning a virtual IP to the current master and moving it to the slave during a failover (via a callback script), the client continues to connect to the same address, perceiving the service as a single‑node high‑availability Redis.
In practice, the deployment also uses a process supervisor (e.g., supervisor) to automatically restart crashed Redis or Sentinel processes, further improving reliability.
Overall, achieving true high availability for Redis requires more than just adding a slave; it demands multiple Sentinels, a quorum‑based failover, and optional VIP handling to hide the complexity from clients.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
