Designing a Highly Available Redis Service Using Sentinel
This article explains how to build a highly available Redis deployment by defining HA requirements, analyzing failure scenarios, and progressively implementing solutions from a single instance to a three‑sentinel architecture with virtual IP failover for seamless client access.
Redis, as an in‑memory key‑value store, is widely used for session storage, caching, simple message queues, and pub/sub, but providing it as a highly available service is a common challenge for backend teams.
High availability (HA) is defined as the ability to continue serving requests despite various failures, such as a single process crash, an entire node outage, or network partition between nodes. The design goal is to tolerate any single‑point failure while keeping the probability of multiple simultaneous failures negligible.
Several HA architectures are compared:
Solution 1 – Single‑node Redis (no Sentinel): Simple but suffers from a single point of failure; if the process or server stops, the service is lost and data may be lost without persistence.
Solution 2 – Master‑Slave with a single Sentinel: Adds a replica and a Sentinel process to monitor the master and promote the slave on failure, but the Sentinel itself becomes a single point of failure.
Solution 3 – Master‑Slave with two Sentinels: Deploys two Sentinel instances so the client can query either one. However, if one server goes down, only one Sentinel remains, which is insufficient to reach the required 50% quorum for failover, so the service can still become unavailable.
Solution 4 – Master‑Slave with three Sentinels (final architecture): Introduces a third server running an additional Sentinel, achieving a quorum of at least two out of three Sentinels. This configuration tolerates a single node failure, a single Sentinel failure, or a network partition between any two nodes, maintaining service continuity.
For client convenience, a virtual IP (VIP) can be used to present a single address to applications. When a master‑slave switch occurs, a callback moves the VIP to the new master, allowing clients to continue using the same endpoint as if it were a single‑node Redis.
The article concludes that while basic service deployment is straightforward, achieving true high availability requires additional servers, multiple Sentinel processes, and supervision tools (e.g., Supervisor) to automatically restart crashed processes.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.