Databases 11 min read

Mastering Redis Disaster Recovery: Sentinel and Manual Failover Strategies

This article explains how to protect Redis deployments from single‑point failures by using master‑slave replication, manual failover procedures, and the automated high‑availability Sentinel solution, providing practical guidance for reliable disaster recovery in non‑clustered environments.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering Redis Disaster Recovery: Sentinel and Manual Failover Strategies

Redis is a high‑performance in‑memory key‑value database widely used in internet architectures. This article introduces disaster‑recovery solutions for Redis in non‑clustered distributed scenarios, covering common use cases, single‑point‑failure risks, master‑slave replication architectures, their pros and cons, manual failover steps, and the high‑availability Sentinel solution.

1. Brief Introduction to Redis

Redis is a high‑performance key‑value NoSQL database offering persistence, high availability, various data structures, and clustering. Common use cases include session cache, full‑page cache (e.g., WordPress wp‑redis plugin), queues, ranking, and publish/subscribe.

Session Cache : Provides persistent sessions for long‑lived scenarios such as shopping carts, improving user experience.

Full‑Page Cache : WordPress plugins like

wp-redis

load previously visited pages at maximum speed.

Queue : Redis list and set operations make it a good message‑queue platform, often used to limit purchase actions during promotions.

Ranking : In‑memory increment/decrement operations enable fast ranking for content recommendation.

Publish/Subscribe : Supports chat systems and other real‑time notification scenarios.

2. Single‑Point‑Failure Issues in Redis Deployments

Many companies still use single‑node Redis deployments, which can cause severe outages. The author recounts a 2015 incident where a single Redis instance crashed, leading to uncontrolled discount purchases and significant business loss.

3. Backup and Disaster Recovery for Non‑Clustered Redis

Redis master‑slave replication is common. Two typical architectures are described:

Common Master‑Slave Replication

Scheme 1 : One master and two slaves; writes go to the master, reads can be served by slaves for load balancing.

Scheme 2 : One master and two slaves with keepalived VIP for the master IP, allowing clients to connect via a virtual IP to avoid IP changes.

Advantages and Disadvantages

Advantages

Provides backup of master data; slaves can be promoted if the master fails.

Enables read scaling by offloading reads to slaves.

Disadvantages In Scheme 1, if the master fails, clients lose write capability and slaves cannot replicate, requiring manual promotion steps. In Scheme 2, after failover the remaining slave becomes a single point of failure, still needing manual intervention.

Manual failover steps (example for Scheme 1):

1) On Slave1 execute slaveof no one to promote it to master. 2) Configure Slave1 as writable (slaves are read‑only by default). 3) Notify client applications of the new master address. 4) Configure Slave2 to replicate from the new master.

Scheme 2 requires similar steps after promoting the slave.

4. Introduction to Redis Sentinel

Sentinel provides an automated high‑availability solution for Redis, performing monitoring, notification, automatic failover, and configuration provisioning without human intervention.

5. Sentinel Functions

Monitoring: Continuously checks master and slaves for expected operation.

Notification: Alerts administrators or monitoring systems via API when a failure occurs.

Automatic Failover: Promotes a slave to master and reconfigures other slaves automatically.

Configuration Provider: Clients can query Sentinel to obtain the current master address.

6. Sentinel Architecture

Redis Sentinel architecture diagram
Redis Sentinel architecture diagram

7. Sentinel Failover Process

Sentinel instances elect a leader.

The leader selects a slave to promote based on criteria:

Duration of disconnection from master.

Slave priority configured in

redis.conf

.

Replication offset (larger offset means more up‑to‑date data).

Run ID (smallest ID wins if all else equal).

The leader runs

slaveof no one

on the chosen slave to make it the new master.

The leader reconfigures the remaining slaves to replicate from the new master.

The old master is demoted to slave; when it recovers, the leader re‑adds it as a slave.

Conclusion

Using Sentinel achieves Redis high availability with fully automatic failover, eliminating business impact and reducing operational effort. Deploy an odd number of Sentinel instances, at least three, to ensure reliable quorum.

DatabaseHigh AvailabilityRedisdisaster recoverySentinelMaster-Slave Replication
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.