Mastering Redis Disaster Recovery: Sentinel and Manual Failover Strategies
This article explains how to protect Redis deployments from single‑point failures by using master‑slave replication, manual failover procedures, and the automated high‑availability Sentinel solution, providing practical guidance for reliable disaster recovery in non‑clustered environments.
Redis is a high‑performance in‑memory key‑value database widely used in internet architectures. This article introduces disaster‑recovery solutions for Redis in non‑clustered distributed scenarios, covering common use cases, single‑point‑failure risks, master‑slave replication architectures, their pros and cons, manual failover steps, and the high‑availability Sentinel solution.
1. Brief Introduction to Redis
Redis is a high‑performance key‑value NoSQL database offering persistence, high availability, various data structures, and clustering. Common use cases include session cache, full‑page cache (e.g., WordPress wp‑redis plugin), queues, ranking, and publish/subscribe.
Session Cache : Provides persistent sessions for long‑lived scenarios such as shopping carts, improving user experience.
Full‑Page Cache : WordPress plugins like
wp-redisload previously visited pages at maximum speed.
Queue : Redis list and set operations make it a good message‑queue platform, often used to limit purchase actions during promotions.
Ranking : In‑memory increment/decrement operations enable fast ranking for content recommendation.
Publish/Subscribe : Supports chat systems and other real‑time notification scenarios.
2. Single‑Point‑Failure Issues in Redis Deployments
Many companies still use single‑node Redis deployments, which can cause severe outages. The author recounts a 2015 incident where a single Redis instance crashed, leading to uncontrolled discount purchases and significant business loss.
3. Backup and Disaster Recovery for Non‑Clustered Redis
Redis master‑slave replication is common. Two typical architectures are described:
Common Master‑Slave Replication
Scheme 1 : One master and two slaves; writes go to the master, reads can be served by slaves for load balancing.
Scheme 2 : One master and two slaves with keepalived VIP for the master IP, allowing clients to connect via a virtual IP to avoid IP changes.
Advantages and Disadvantages
Advantages
Provides backup of master data; slaves can be promoted if the master fails.
Enables read scaling by offloading reads to slaves.
Disadvantages In Scheme 1, if the master fails, clients lose write capability and slaves cannot replicate, requiring manual promotion steps. In Scheme 2, after failover the remaining slave becomes a single point of failure, still needing manual intervention.
Manual failover steps (example for Scheme 1):
1) On Slave1 execute slaveof no one to promote it to master. 2) Configure Slave1 as writable (slaves are read‑only by default). 3) Notify client applications of the new master address. 4) Configure Slave2 to replicate from the new master.
Scheme 2 requires similar steps after promoting the slave.
4. Introduction to Redis Sentinel
Sentinel provides an automated high‑availability solution for Redis, performing monitoring, notification, automatic failover, and configuration provisioning without human intervention.
5. Sentinel Functions
Monitoring: Continuously checks master and slaves for expected operation.
Notification: Alerts administrators or monitoring systems via API when a failure occurs.
Automatic Failover: Promotes a slave to master and reconfigures other slaves automatically.
Configuration Provider: Clients can query Sentinel to obtain the current master address.
6. Sentinel Architecture
7. Sentinel Failover Process
Sentinel instances elect a leader.
The leader selects a slave to promote based on criteria:
Duration of disconnection from master.
Slave priority configured in
redis.conf.
Replication offset (larger offset means more up‑to‑date data).
Run ID (smallest ID wins if all else equal).
The leader runs
slaveof no oneon the chosen slave to make it the new master.
The leader reconfigures the remaining slaves to replicate from the new master.
The old master is demoted to slave; when it recovers, the leader re‑adds it as a slave.
Conclusion
Using Sentinel achieves Redis high availability with fully automatic failover, eliminating business impact and reducing operational effort. Deploy an odd number of Sentinel instances, at least three, to ensure reliable quorum.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.