How to Build a Highly Available Redis Service with Sentinel and Virtual IP
This article explains why Redis is a popular in‑memory key‑value store, defines high availability, enumerates failure scenarios, and walks through four incremental architectures—single instance, master‑slave with one Sentinel, dual Sentinel, and three‑Sentinel with VIP—to achieve a robust, production‑grade Redis deployment.
In‑memory Redis has become the de‑facto key‑value database for many web applications, used for session storage, caching hot data, simple message queues (LPUSH/BRPOP), and pub/sub systems. Large internet companies often expose Redis as a foundational service to internal teams.
Any provider of such a service must answer the question of high availability: can the service continue to operate, or recover quickly, when various failures occur? The article defines three typical failure types: (1) a single Redis process crashes, (2) an entire node goes down, and (3) network communication between two nodes is broken. High availability is achieved by designing the system to tolerate any single‑point failure, assuming multiple independent failures are extremely unlikely.
Several HA solutions exist (Keepalived, Codis, Twemproxy, Redis Sentinel). For modest data volumes, the author chose the official Redis Sentinel over cluster‑oriented tools.
Solution 1: Single‑node Redis (no Sentinel)
This setup works for personal projects but suffers from a single point of failure: if the Redis process or its host crashes, the service and any in‑memory data become unavailable.
Solution 2: Master‑Slave with a single Sentinel
Adding a slave and a Sentinel process allows automatic promotion of the slave when the master fails, eliminating the master‑only failure mode. Clients query Sentinel to discover the current master. However, the Sentinel itself is a single point of failure, so this architecture is not fully HA.
Solution 3: Master‑Slave with dual Sentinels
Running two Sentinel instances lets clients fall back to the other Sentinel if one fails. Nevertheless, when an entire node goes down, only one Sentinel remains reachable, which is insufficient for Redis’s quorum rule (more than 50% of Sentinels must be reachable to trigger a failover), so the service can still become unavailable.
Solution 4: Master‑Slave with three Sentinels
Adding a third server with its own Sentinel gives three Sentinels overseeing two Redis instances. This configuration tolerates any single‑process failure, any single‑machine failure, or any two‑machine network partition, ensuring continuous service availability.
Optionally, a fourth Redis instance can be added to form a 1‑master + 2‑slave topology, improving data redundancy at the cost of additional replication latency.
When a node loses all network connectivity, the remaining Sentinels promote the surviving slave to master, but both masters may briefly serve traffic, risking data inconsistency. To mitigate this, Redis’s min‑slaves‑to‑write and min‑slaves‑max‑lag settings can be tuned to stop writes on a node that detects network issues.
For a seamless client experience, a virtual IP (VIP) can be assigned to the current master. A failover script moves the VIP to the new master, allowing clients to continue connecting to a single IP and port as if they were using a standalone Redis instance.
In summary, building a highly available Redis service involves moving from a single instance to a multi‑node architecture with three Sentinels (and optionally a VIP) and using process supervisors like supervisor to automatically restart crashed processes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
