Mastering Redis Sentinel: Build High‑Availability Clusters Step‑by‑Step
This article explains Redis Sentinel’s role in achieving high availability, details its core functions, underlying Raft‑based algorithm, configuration parameters, practical setup steps, fault‑tolerance mechanisms, quorum and majority calculations, and demonstrates failover and recovery scenarios with real command‑line examples.
Preface
Continuing from the previous article, because Redis master‑slave replication cannot achieve high availability, Redis uses Sentinel on top of the master‑slave architecture to implement a highly available Redis cluster.
Redis Sentinel
Sentinel is a crucial component in Redis cluster architecture; it solves the need for manual intervention when master‑slave replication fails.
Key functions of Redis Sentinel:
Cluster monitoring: monitors whether Redis master and slaves are working correctly.
Message notification: sends alerts to administrators when a Redis instance fails.
Failover: when the master node fails, automatically elects a new master, enabling self‑healing.
Configuration center: after a failure, notifies clients and other slaves of the new master address.
Principle
Redis Sentinel’s core algorithm is based on Raft, used for distributed system fault tolerance and leader election. The process includes:
Each Sentinel automatically discovers other Sentinels and slaves, sending a PING to known masters, slaves, and Sentinels once per second.
If an instance does not reply to PING within the
down-after-millisecondsthreshold, it is marked subjectively down (SDOWN).
When a master is marked SDOWN, all Sentinels monitoring it confirm the status at the same frequency.
If a majority (quorum) of Sentinels agree, the master is marked objectively down (ODOWN).
Sentinels then increase INFO command frequency to the downed master’s slaves from every 10 seconds to every second.
When enough Sentinels agree, the ODOWN status is cleared once the master responds again.
Detailed steps can be observed in the Sentinel logs.
Sentinel Deployment Practice
Assuming master‑slave setup is already done, configure Sentinel via
sentinel.conf:
# Sentinel instance port, default 26379
port 26379
dir ./
protected-mode no
daemonize yes
logfile ./sentinel.log
# Monitor master
sentinel monitor mymaster 127.0.0.1 6379 2
# Authentication (if master requires a password)
sentinel auth-pass mymaster 123456
# Down‑after timeout (default 30 s)
sentinel down-after-milliseconds mymaster 30000
# Number of slaves that can sync simultaneously during failover
sentinel parallel-syncs mymaster 1
# Failover timeout (default 180 000 ms)
sentinel failover-timeout mymaster 180000
# Notification scripts
sentinel notification-script mymaster /var/redis/notify.sh
sentinel client-reconfig-script mymaster /var/redis/reconfig.shStart order: first start the Redis master, then the slaves, and finally the Sentinel instances.
Verify replication on the master:
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.22.29.101,port=6379,state=online,offset=4448,lag=1
slave1:ip=172.22.29.100,port=6379,state=online,offset=4448,lag=1Check Sentinel status:
127.0.0.1:26379> info sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
master0:name=mymaster,status=ok,address=172.22.29.99:6379,slaves=2,sentinels=3Simulating Failover
Stop the master Redis service:
systemctl stop redisSentinel logs show the failover process, including SDOWN, ODOWN, leader election, slave promotion, and master switch.
Simulating Node Recovery
Restart the previously stopped Redis node:
systemctl start redisSentinel logs indicate the node is converted back to a slave and resynchronizes with the new master.
Sentinel Node Calculations
Two important parameters:
quorum : minimum number of Sentinels that must agree a master is ODOWN.
majority : minimum number of Sentinels required to authorize a failover.
If
quorum < majority, a failover can be authorized with fewer Sentinels than required for quorum; if
quorum ≥ majority, all Sentinels in the quorum must agree.
int sentinelIsQuorumReachable(sentinelRedisInstance *master, int *usableptr) {
int usable = 1; // count self
int voters = dictSize(master->sentinels) + 1; // known Sentinels + self
// iterate over known Sentinels and count usable ones
// ...
if (usable < (int)master->quorum) result |= SENTINEL_ISQR_NOQUORUM;
if (usable < voters/2+1) result |= SENTINEL_ISQR_NOAUTH;
if (usableptr) *usableptr = usable;
return result;
}
majority = voters/2 + 1;Why at Least Three Sentinels?
With only two Sentinels, the majority is 2; if one fails, the remaining Sentinel cannot meet the majority requirement, preventing failover when the master crashes.
Split‑Brain Scenario
A split‑brain occurs when network partitions isolate the master from slaves and Sentinels, causing slaves to be promoted to masters independently. This can lead to data loss if the original master continues to accept writes.
Two configuration parameters help mitigate split‑brain:
min-replicas-to-write 3– requires at least three slaves to be connected before the master accepts writes.
min-replicas-max-lag 10– limits the maximum replication lag to 10 seconds.
With these settings, a master will reject write requests during a split‑brain, reducing potential data loss.
Summary
Redis Sentinel provides high‑availability for Redis by monitoring, notifying, and automatically failing over master nodes, while Redis Cluster addresses scalability and throughput.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.