Redis Sentinel Mode Explained: Automatic Failure Detection and Master‑Slave Switching in Practice
This guide walks through Redis Sentinel’s architecture, explains subjective and objective down states, details the leader election and failover workflow, shows step‑by‑step configuration of a three‑node Sentinel cluster, client integration in Python and Java, and provides best‑practice recommendations, monitoring metrics, and troubleshooting tips.
Overview
Redis replication provides read scaling and data redundancy, but manual failover is required when the master crashes. Redis Sentinel adds automatic health monitoring, leader election, and client redirection, becoming the standard high‑availability solution for Redis.
Technical Features
Automatic failover : Sentinel promotes a replica within ~30 seconds without human intervention.
Service discovery : Clients query Sentinel for the current master address, hiding IP changes.
Configuration propagation : After failover, Sentinel updates all replicas' replication settings.
Multi‑Sentinel cooperation : A quorum mechanism prevents false positives caused by network jitter.
Applicable Scenarios
Data size fits a single node (typically < 100 GB) – no sharding needed.
High write‑availability requirements, tolerating ~30 seconds of write interruption.
Read‑heavy workloads that benefit from read‑only replicas.
When horizontal scaling (Cluster) is unnecessary but higher availability than a single master is required.
Environment Requirements
Redis : 8.x (master, replicas and Sentinel must share the same version).
OS : Linux kernel 4.x+ with a stable network stack.
Sentinel nodes : Minimum 3 (odd‑number quorum).
Network latency : < 10 ms between Sentinel nodes; higher latency degrades detection accuracy.
Detailed Steps
Sentinel Working Principle
Subjective Down (SDOWN) vs Objective Down (ODOWN)
SDOWN is a single Sentinel’s judgment that a node is unreachable after the down-after-milliseconds timeout; it does not trigger failover. ODOWN occurs when a quorum of Sentinels agree a node is down, which is the prerequisite for failover.
Sentinel A: PING master → no response → mark master SDOWN
Sentinel A asks B, C: "Do you also think master is down?"
Sentinel B: "Yes, no response"
Sentinel C: "Yes"
Quorum=2 → Sentinel A marks master ODOWN → start leader electionFailover Process
Leader election : Sentinels use Raft to elect a leader that coordinates the failover.
New master selection : The leader picks a replica based on (a) lowest replica-priority (0 = never master), (b) highest replication offset, (c) smallest Run ID.
Execute switch : Sends REPLICAOF NO ONE to the chosen replica, making it the new master.
Reconfigure : Updates other replicas to follow the new master and writes the new configuration to Sentinel files.
Notify clients : Publishes a +switch-master event via Pub/Sub.
Three‑Node Sentinel Cluster Configuration
Node Planning
Node Role IP Port
Redis master 10.0.1.10 6379
Redis slave1 10.0.1.11 6379
Redis slave2 10.0.1.12 6379
Sentinel 1 10.0.1.10 26379
Sentinel 2 10.0.1.11 26379
Sentinel 3 10.0.1.12 26379Sentinel and Redis can share a machine, but their failures must be independent.
Redis Master Configuration
# /etc/redis/redis.conf (master)
bind 10.0.1.10 127.0.0.1
port 6379
daemonize yes
logfile /var/log/redis/redis-server.log
dir /var/lib/redis
# Persistence
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec
# Authentication (must match replicas and Sentinel)
requirepass "Redis@Secure2024!"
masterauth "Redis@Secure2024!"
# Memory
maxmemory 8gb
maxmemory-policy allkeys-lru
# Split‑brain protection
min-replicas-to-write 1
min-replicas-max-lag 10Redis Replica Configuration
# /etc/redis/redis.conf (replicas)
bind 10.0.1.11 127.0.0.1 # change IP for the second replica
port 6379
daemonize yes
logfile /var/log/redis/redis-server.log
dir /var/lib/redis
replicaof 10.0.1.10 6379
requirepass "Redis@Secure2024!"
masterauth "Redis@Secure2024!"
replica-read-only yes
replica-priority 100 # lower value = higher priority
appendonly yes
appendfsync everysecSentinel Configuration (common for all three)
# /etc/redis/sentinel.conf
bind 10.0.1.10 127.0.0.1 # change IP per node
port 26379
daemonize yes
logfile /var/log/redis/sentinel.log
dir /var/lib/redis
sentinel monitor mymaster 10.0.1.10 6379 2 # quorum=2
sentinel auth-pass mymaster Redis@Secure2024!
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1
requirepass "Sentinel@Secure2024!"
sentinel sentinel-pass Sentinel@Secure2024!Startup Order
# 1. Start master
systemctl start redis-server # 10.0.1.10
# 2. Start replicas
systemctl start redis-server # 10.0.1.11 & 10.0.1.12
# 3. Verify replication
redis-cli -h 10.0.1.10 -p 6379 -a 'Redis@Secure2024!' info replication
# Expected: role:master, connected_slaves:2
# 4. Start all Sentinels
redis-sentinel /etc/redis/sentinel.conf # or systemctl start redis-sentinel
# 5. Verify Sentinel status
redis-cli -h 10.0.1.10 -p 26379 -a 'Sentinel@Secure2024!' sentinel mastersVerifying Failover
# Simulate master failure
redis-cli -h 10.0.1.10 -p 6379 -a 'Redis@Secure2024!' DEBUG sleep 30
# Observe Sentinel logs (expected sequence)
# +sdown master mymaster 10.0.1.10 6379
# +odown master mymaster 10.0.1.10 6379 quorum 2/2
# +try-failover master mymaster 10.0.1.10 6379
# +elected-leader master mymaster 10.0.1.10 6379
# +selected-slave slave 10.0.1.11:6379 mymaster 10.0.1.10 6379
# +promoted-slave slave 10.0.1.11:6379 mymaster 10.0.1.10 6379
# +switch-master mymaster 10.0.1.10 6379 10.0.1.11 6379
# Query new master
redis-cli -h 10.0.1.10 -p 26379 -a 'Sentinel@Secure2024!' sentinel get-master-addr-by-name mymasterClient Integration
Python (redis‑py)
import redis
from redis.sentinel import Sentinel
sentinel = Sentinel([
('10.0.1.10', 26379),
('10.0.1.11', 26379),
('10.0.1.12', 26379)
], socket_timeout=0.5, sentinel_kwargs={'password':'Sentinel@Secure2024!'})
master = sentinel.master_for('mymaster', socket_timeout=0.5, password='Redis@Secure2024!', db=0, retry_on_timeout=True)
slave = sentinel.slave_for('mymaster', socket_timeout=0.5, password='Redis@Secure2024!', db=0)
master.set('key', 'value', ex=3600)
value = slave.get('key')Java (Jedis)
import redis.clients.jedis.JedisSentinelPool;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisPoolConfig;
import java.util.HashSet;
import java.util.Set;
JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setMaxTotal(50);
poolConfig.setMaxIdle(10);
poolConfig.setMinIdle(5);
poolConfig.setTestOnBorrow(true);
Set<String> sentinels = new HashSet<>();
sentinels.add("10.0.1.10:26379");
sentinels.add("10.0.1.11:26379");
sentinels.add("10.0.1.12:26379");
JedisSentinelPool sentinelPool = new JedisSentinelPool("mymaster", sentinels, poolConfig, 2000, 2000, "Redis@Secure2024!", 0, null, 0, 0, "Sentinel@Secure2024!", null);
try (Jedis jedis = sentinelPool.getResource()) {
jedis.set("key", "value");
String val = jedis.get("key");
}Sentinel Monitoring Script (Bash)
#!/bin/bash
SENTINEL_HOSTS=("10.0.1.10" "10.0.1.11" "10.0.1.12")
SENTINEL_PORT=26379
SENTINEL_PASS="Sentinel@Secure2024!"
MASTER_NAME="mymaster"
for host in "${SENTINEL_HOSTS[@]}"; do
echo "--- Sentinel: $host:$SENTINEL_PORT ---"
master_info=$(redis-cli -h "$host" -p $SENTINEL_PORT -a "$SENTINEL_PASS" --no-auth-warning sentinel get-master-addr-by-name "$MASTER_NAME" 2>/dev/null)
if [ $? -eq 0 ]; then
master_ip=$(echo "$master_info" | head -1)
master_port=$(echo "$master_info" | tail -1)
echo " Current master: $master_ip:$master_port"
else
echo " [ERROR] Cannot connect to Sentinel"
continue
fi
sentinel_info=$(redis-cli -h "$host" -p $SENTINEL_PORT -a "$SENTINEL_PASS" --no-auth-warning sentinel masters 2>/dev/null)
num_slaves=$(echo "$sentinel_info" | grep -A1 "num-slaves" | tail -1)
num_sentinels=$(echo "$sentinel_info" | grep -A1 "num-other-sentinels" | tail -1)
quorum=$(echo "$sentinel_info" | grep -A1 "quorum" | tail -1)
flags=$(echo "$sentinel_info" | grep -A1 "^flags$" | tail -1)
echo " Slaves: $num_slaves"
echo " Other Sentinels: $num_sentinels"
echo " Quorum: $quorum"
echo " Master flags: $flags"
echo ""
doneSplit‑Brain Protection
Configure the master with:
min-replicas-to-write 1
min-replicas-max-lag 10This forces the master to reject writes when it loses all replicas or network latency exceeds 10 seconds, preventing data loss during a partition. The trade‑off is that write operations fail in those edge cases.
Best Practices & Caveats
Sentinel Deployment
Odd‑node quorum : Use 3, 5, 7 Sentinels. For 3 nodes, set quorum=2; this tolerates one Sentinel failure.
Cross‑rack placement : Distribute Sentinels across different physical racks or AZs.
Separate machines : Deploy Sentinel on machines distinct from Redis when resources allow, ensuring heartbeat responsiveness.
Failover Parameter Tuning
down-after-millisecondsbalances detection speed vs false positives:
Too low (< 3000 ms) → network jitter may trigger unnecessary failovers.
Too high (> 30000 ms) → real failures take longer to detect.
Recommended 5000‑10000 ms, adjusted to business tolerance.
Persistence Settings
Enable AOF on both master and replicas with appendfsync everysec. Disabling persistence on the master can cause total data loss after a restart because replicas will sync an empty dataset.
Configuration Warnings
Do not edit sentinel.conf while Sentinel is running; configuration‑management tools must exclude fields that Sentinel rewrites.
Ensure requirepass and masterauth are identical across master, replicas, and Sentinel.
After a successful failover, the old master automatically becomes a replica – no manual action required.
Common Errors
Sentinel keeps emitting +sdown but no failover – Cause: Quorum not reached; some Sentinels cannot communicate. Solution: Check network connectivity and firewall rules between Sentinels.
Clients still connect to old master after failover – Cause: Client hard‑coded master IP. Solution: Switch to Sentinel‑aware client libraries or connection pools.
Replica replication lag grows continuously – Cause: Master write load exceeds replica sync capacity. Solution: Increase repl_backlog_size and verify network bandwidth.
Failover timeout, failover fails – Cause: failover-timeout set too low. Solution: Increase failover-timeout and ensure stable Sentinel network.
Sentinel vs. Redis Cluster
Choose Sentinel when data fits a single node (< 100 GB), write QPS < 100 k, and strong consistency for Lua scripts or transactions is needed.
Choose Cluster when data exceeds single‑node memory, write QPS exceeds a single master’s capacity, and linear write scaling is required.
Troubleshooting & Monitoring
Fault Diagnosis
Log inspection : tail -f /var/log/redis/sentinel.log and filter for switch-master, failover, odown.
Replication status :
redis-cli -h 10.0.1.10 -p 6379 -a 'Redis@Secure2024!' info replication.
Sentinel cannot discover replicas : Verify each replica’s replicaof setting.
Old master does not rejoin as replica : Manually run replicaof 10.0.1.11 6379 on the old master.
Performance Monitoring
Replication lag (lag) : Normal 0‑1 s, alert > 10 s.
connected_slaves : Should equal replica count; alert if lower.
Sentinel response time : Normal < 5 ms, alert > 100 ms.
master_last_io_seconds_ago : Normal < 5 s, alert > 30 s.
repl_backlog_active : 1 = active, 0 = inactive.
Backup & Recovery
#!/bin/bash
SLAVE_HOST="10.0.1.11"
REDIS_PASS="Redis@Secure2024!"
BACKUP_DIR="/opt/redis/backups"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p "$BACKUP_DIR"
# Trigger BGSAVE on replica
redis-cli -h $SLAVE_HOST -p 6379 -a "$REDIS_PASS" --no-auth-warning BGSAVE
sleep 5
cp /var/lib/redis/dump.rdb "$BACKUP_DIR/dump_$DATE.rdb"
echo "Backup completed: $BACKUP_DIR/dump_$DATE.rdb"
# Keep last 7 days
find "$BACKUP_DIR" -name "dump_*.rdb" -mtime +7 -deleteSummary
Key Takeaways
SDOWN is a single Sentinel’s subjective view; ODOWN requires quorum and triggers failover.
Quorum of 2 in a 3‑node Sentinel cluster balances availability and false‑positive avoidance.
Split‑brain protection via min-replicas-to-write prevents data loss during network partitions.
Clients must obtain the master address from Sentinel; hard‑coded IPs break automatic failover.
Replica priority ( replica-priority) controls which replica becomes the new master.
Further Learning
Redis Cluster for horizontal scaling beyond a single node.
Combining Sentinel with proxy solutions (Twemproxy, Codis) to hide Sentinel logic from applications.
Deep dive into Redis persistence (AOF rewrite strategies, mixed RDB+AOF) and their impact on recovery time.
References
Redis Sentinel official documentation.
Redis 8.x release notes.
Redis high‑availability architecture design articles.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
