How to Build a Highly Available Redis Service with Sentinel – A Practical Guide
This article explains why Redis needs high availability, defines common failure scenarios, compares several HA architectures—including single‑instance, master‑slave with one or multiple Sentinel processes, and VIP‑based solutions—and provides step‑by‑step guidance for deploying a robust Redis Sentinel cluster.
Why High Availability Matters for Redis
Redis is widely used for session storage, caching hot data, simple message queues, and Pub/Sub. While a single Redis instance works for development or small projects, it creates a single point of failure: if the Redis process or the host machine crashes, the service becomes unavailable and any in‑memory data may be lost.
Defining High Availability
High availability (HA) means the service continues to operate despite certain failures, or recovers within a very short time. The article identifies three typical failure types:
Failure 1: A Redis process on a node is killed.
Failure 2: An entire node goes down (power loss, hardware fault).
Failure 3: Network communication between two nodes is broken.
HA design aims to tolerate any single of these events, assuming multiple simultaneous failures are extremely unlikely.
Evaluating HA Solutions
Several open‑source options exist, such as Keepalived, Codis, Twemproxy, and Redis Sentinel. For modest data volumes, the author chose Redis Sentinel, the official solution, over cluster‑oriented tools.
Solution 1 – Single‑Instance Redis
A lone Redis server is simple but suffers from a single point of failure; if the process or host stops, the service is lost.
Solution 2 – Master‑Slave with One Sentinel
Two Redis servers (master and slave) are deployed, plus a single Sentinel process that monitors them. Clients query Sentinel to discover the current master. However, the Sentinel itself becomes a single point of failure.
Solution 3 – Master‑Slave with Two Sentinels
Running two Sentinel instances on separate machines allows clients to fall back to the other Sentinel if one fails. Yet, when a whole node fails, only one Sentinel remains reachable, which is insufficient because Redis requires a majority (more than 50%) of Sentinels to agree before promoting a slave to master.
Solution 4 – Master‑Slave with Three Sentinels (Final Architecture)
Adding a third node with an additional Sentinel yields three Sentinels overseeing two Redis servers. This configuration tolerates any single‑node failure, any single Sentinel failure, or a network split between two nodes, while still providing a majority for failover decisions.
The author also notes that if the three machines have spare capacity, a third Redis instance can be added to form a 1‑master + 2‑slave setup, improving data redundancy at the cost of additional replication overhead.
Handling Split‑Brain Scenarios
When a node loses network connectivity but remains alive, two masters could appear simultaneously, causing data inconsistency. To mitigate this, the author recommends configuring min‑slaves‑to‑write and min‑slaves‑max‑lag so a Redis server stops accepting writes if it cannot confirm enough healthy slaves.
Providing a Seamless Client Experience
Clients typically need to know the Sentinel addresses and use libraries that support Sentinel discovery (e.g., ioredis for Node.js, go‑redis for Go). To make the service appear like a single‑node Redis, a Virtual IP (VIP) can be assigned to the current master; a failover script moves the VIP to the new master, allowing clients to connect to a fixed IP and port.
Operational Details
The deployment uses Supervisor to monitor Redis and Sentinel processes, automatically restarting them on crash. The author emphasizes that achieving “usable” service is easy, but building true high availability adds significant complexity.
Conclusion
By combining three Sentinel processes with two Redis instances (master‑slave) and optionally a VIP, the author achieved a robust, highly available Redis service suitable for production workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
