Databases 27 min read

Redis Sentinel Mode Explained: Automatic Failure Detection and Master‑Slave Switching in Practice

This guide walks through Redis Sentinel’s architecture, explains subjective and objective down states, details the leader election and failover workflow, shows step‑by‑step configuration of a three‑node Sentinel cluster, client integration in Python and Java, and provides best‑practice recommendations, monitoring metrics, and troubleshooting tips.

Raymond Ops

Jun 17, 2026

Redis Sentinel Mode Explained: Automatic Failure Detection and Master‑Slave Switching in Practice

Overview

Redis replication provides read scaling and data redundancy, but manual failover is required when the master crashes. Redis Sentinel adds automatic health monitoring, leader election, and client redirection, becoming the standard high‑availability solution for Redis.

Technical Features

Automatic failover : Sentinel promotes a replica within ~30 seconds without human intervention.

Service discovery : Clients query Sentinel for the current master address, hiding IP changes.

Configuration propagation : After failover, Sentinel updates all replicas' replication settings.

Multi‑Sentinel cooperation : A quorum mechanism prevents false positives caused by network jitter.

Applicable Scenarios

Data size fits a single node (typically < 100 GB) – no sharding needed.

High write‑availability requirements, tolerating ~30 seconds of write interruption.

Read‑heavy workloads that benefit from read‑only replicas.

When horizontal scaling (Cluster) is unnecessary but higher availability than a single master is required.

Environment Requirements

Redis : 8.x (master, replicas and Sentinel must share the same version).

OS : Linux kernel 4.x+ with a stable network stack.

Sentinel nodes : Minimum 3 (odd‑number quorum).

Network latency : < 10 ms between Sentinel nodes; higher latency degrades detection accuracy.

Detailed Steps

Sentinel Working Principle

Subjective Down (SDOWN) vs Objective Down (ODOWN)

SDOWN is a single Sentinel’s judgment that a node is unreachable after the down-after-milliseconds timeout; it does not trigger failover. ODOWN occurs when a quorum of Sentinels agree a node is down, which is the prerequisite for failover.

Sentinel A: PING master → no response → mark master SDOWN
Sentinel A asks B, C: "Do you also think master is down?"
Sentinel B: "Yes, no response"
Sentinel C: "Yes"
Quorum=2 → Sentinel A marks master ODOWN → start leader election

Failover Process

Leader election : Sentinels use Raft to elect a leader that coordinates the failover.

New master selection : The leader picks a replica based on (a) lowest replica-priority (0 = never master), (b) highest replication offset, (c) smallest Run ID.

Execute switch : Sends REPLICAOF NO ONE to the chosen replica, making it the new master.

Reconfigure : Updates other replicas to follow the new master and writes the new configuration to Sentinel files.

Notify clients : Publishes a +switch-master event via Pub/Sub.

Three‑Node Sentinel Cluster Configuration

Node Planning

Node Role    IP          Port
Redis master 10.0.1.10   6379
Redis slave1 10.0.1.11   6379
Redis slave2 10.0.1.12   6379
Sentinel 1   10.0.1.10   26379
Sentinel 2   10.0.1.11   26379
Sentinel 3   10.0.1.12   26379

Sentinel and Redis can share a machine, but their failures must be independent.

Redis Master Configuration

# /etc/redis/redis.conf (master)
bind 10.0.1.10 127.0.0.1
port 6379
daemonize yes
logfile /var/log/redis/redis-server.log
dir /var/lib/redis

# Persistence
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec

# Authentication (must match replicas and Sentinel)
requirepass "Redis@Secure2024!"
masterauth "Redis@Secure2024!"

# Memory
maxmemory 8gb
maxmemory-policy allkeys-lru

# Split‑brain protection
min-replicas-to-write 1
min-replicas-max-lag 10

Redis Replica Configuration

# /etc/redis/redis.conf (replicas)
bind 10.0.1.11 127.0.0.1   # change IP for the second replica
port 6379
daemonize yes
logfile /var/log/redis/redis-server.log
dir /var/lib/redis

replicaof 10.0.1.10 6379
requirepass "Redis@Secure2024!"
masterauth "Redis@Secure2024!"
replica-read-only yes
replica-priority 100   # lower value = higher priority
appendonly yes
appendfsync everysec

Sentinel Configuration (common for all three)

# /etc/redis/sentinel.conf
bind 10.0.1.10 127.0.0.1   # change IP per node
port 26379
daemonize yes
logfile /var/log/redis/sentinel.log
dir /var/lib/redis

sentinel monitor mymaster 10.0.1.10 6379 2   # quorum=2
sentinel auth-pass mymaster Redis@Secure2024!
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1
requirepass "Sentinel@Secure2024!"
sentinel sentinel-pass Sentinel@Secure2024!

Startup Order

# 1. Start master
systemctl start redis-server   # 10.0.1.10

# 2. Start replicas
systemctl start redis-server   # 10.0.1.11 & 10.0.1.12

# 3. Verify replication
redis-cli -h 10.0.1.10 -p 6379 -a 'Redis@Secure2024!' info replication
# Expected: role:master, connected_slaves:2

# 4. Start all Sentinels
redis-sentinel /etc/redis/sentinel.conf   # or systemctl start redis-sentinel

# 5. Verify Sentinel status
redis-cli -h 10.0.1.10 -p 26379 -a 'Sentinel@Secure2024!' sentinel masters

Verifying Failover

# Simulate master failure
redis-cli -h 10.0.1.10 -p 6379 -a 'Redis@Secure2024!' DEBUG sleep 30

# Observe Sentinel logs (expected sequence)
# +sdown master mymaster 10.0.1.10 6379
# +odown master mymaster 10.0.1.10 6379 quorum 2/2
# +try-failover master mymaster 10.0.1.10 6379
# +elected-leader master mymaster 10.0.1.10 6379
# +selected-slave slave 10.0.1.11:6379 mymaster 10.0.1.10 6379
# +promoted-slave slave 10.0.1.11:6379 mymaster 10.0.1.10 6379
# +switch-master mymaster 10.0.1.10 6379 10.0.1.11 6379

# Query new master
redis-cli -h 10.0.1.10 -p 26379 -a 'Sentinel@Secure2024!' sentinel get-master-addr-by-name mymaster

Client Integration

Python (redis‑py)

import redis
from redis.sentinel import Sentinel

sentinel = Sentinel([
    ('10.0.1.10', 26379),
    ('10.0.1.11', 26379),
    ('10.0.1.12', 26379)
], socket_timeout=0.5, sentinel_kwargs={'password':'Sentinel@Secure2024!'})

master = sentinel.master_for('mymaster', socket_timeout=0.5, password='Redis@Secure2024!', db=0, retry_on_timeout=True)
slave  = sentinel.slave_for('mymaster', socket_timeout=0.5, password='Redis@Secure2024!', db=0)

master.set('key', 'value', ex=3600)
value = slave.get('key')

Java (Jedis)

import redis.clients.jedis.JedisSentinelPool;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisPoolConfig;
import java.util.HashSet;
import java.util.Set;

JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setMaxTotal(50);
poolConfig.setMaxIdle(10);
poolConfig.setMinIdle(5);
poolConfig.setTestOnBorrow(true);

Set<String> sentinels = new HashSet<>();
sentinels.add("10.0.1.10:26379");
sentinels.add("10.0.1.11:26379");
sentinels.add("10.0.1.12:26379");

JedisSentinelPool sentinelPool = new JedisSentinelPool("mymaster", sentinels, poolConfig, 2000, 2000, "Redis@Secure2024!", 0, null, 0, 0, "Sentinel@Secure2024!", null);

try (Jedis jedis = sentinelPool.getResource()) {
    jedis.set("key", "value");
    String val = jedis.get("key");
}

Sentinel Monitoring Script (Bash)

#!/bin/bash
SENTINEL_HOSTS=("10.0.1.10" "10.0.1.11" "10.0.1.12")
SENTINEL_PORT=26379
SENTINEL_PASS="Sentinel@Secure2024!"
MASTER_NAME="mymaster"

for host in "${SENTINEL_HOSTS[@]}"; do
    echo "--- Sentinel: $host:$SENTINEL_PORT ---"
    master_info=$(redis-cli -h "$host" -p $SENTINEL_PORT -a "$SENTINEL_PASS" --no-auth-warning sentinel get-master-addr-by-name "$MASTER_NAME" 2>/dev/null)
    if [ $? -eq 0 ]; then
        master_ip=$(echo "$master_info" | head -1)
        master_port=$(echo "$master_info" | tail -1)
        echo "  Current master: $master_ip:$master_port"
    else
        echo "  [ERROR] Cannot connect to Sentinel"
        continue
    fi
    sentinel_info=$(redis-cli -h "$host" -p $SENTINEL_PORT -a "$SENTINEL_PASS" --no-auth-warning sentinel masters 2>/dev/null)
    num_slaves=$(echo "$sentinel_info" | grep -A1 "num-slaves" | tail -1)
    num_sentinels=$(echo "$sentinel_info" | grep -A1 "num-other-sentinels" | tail -1)
    quorum=$(echo "$sentinel_info" | grep -A1 "quorum" | tail -1)
    flags=$(echo "$sentinel_info" | grep -A1 "^flags$" | tail -1)
    echo "  Slaves: $num_slaves"
    echo "  Other Sentinels: $num_sentinels"
    echo "  Quorum: $quorum"
    echo "  Master flags: $flags"
    echo ""
 done

Split‑Brain Protection

Configure the master with:

min-replicas-to-write 1
min-replicas-max-lag 10

This forces the master to reject writes when it loses all replicas or network latency exceeds 10 seconds, preventing data loss during a partition. The trade‑off is that write operations fail in those edge cases.

Best Practices & Caveats

Sentinel Deployment

Odd‑node quorum : Use 3, 5, 7 Sentinels. For 3 nodes, set quorum=2; this tolerates one Sentinel failure.

Cross‑rack placement : Distribute Sentinels across different physical racks or AZs.

Separate machines : Deploy Sentinel on machines distinct from Redis when resources allow, ensuring heartbeat responsiveness.

Failover Parameter Tuning

down-after-milliseconds

balances detection speed vs false positives:

Too low (< 3000 ms) → network jitter may trigger unnecessary failovers.

Too high (> 30000 ms) → real failures take longer to detect.

Recommended 5000‑10000 ms, adjusted to business tolerance.

Persistence Settings

Enable AOF on both master and replicas with appendfsync everysec. Disabling persistence on the master can cause total data loss after a restart because replicas will sync an empty dataset.

Configuration Warnings

Do not edit sentinel.conf while Sentinel is running; configuration‑management tools must exclude fields that Sentinel rewrites.

Ensure requirepass and masterauth are identical across master, replicas, and Sentinel.

After a successful failover, the old master automatically becomes a replica – no manual action required.

Common Errors

Sentinel keeps emitting +sdown but no failover – Cause: Quorum not reached; some Sentinels cannot communicate. Solution: Check network connectivity and firewall rules between Sentinels.

Clients still connect to old master after failover – Cause: Client hard‑coded master IP. Solution: Switch to Sentinel‑aware client libraries or connection pools.

Replica replication lag grows continuously – Cause: Master write load exceeds replica sync capacity. Solution: Increase repl_backlog_size and verify network bandwidth.

Failover timeout, failover fails – Cause: failover-timeout set too low. Solution: Increase failover-timeout and ensure stable Sentinel network.

Sentinel vs. Redis Cluster

Choose Sentinel when data fits a single node (< 100 GB), write QPS < 100 k, and strong consistency for Lua scripts or transactions is needed.

Choose Cluster when data exceeds single‑node memory, write QPS exceeds a single master’s capacity, and linear write scaling is required.

Troubleshooting & Monitoring

Fault Diagnosis

Log inspection : tail -f /var/log/redis/sentinel.log and filter for switch-master, failover, odown.

Replication status :

redis-cli -h 10.0.1.10 -p 6379 -a 'Redis@Secure2024!' info replication

Sentinel cannot discover replicas : Verify each replica’s replicaof setting.

Old master does not rejoin as replica : Manually run replicaof 10.0.1.11 6379 on the old master.

Performance Monitoring

Replication lag (lag) : Normal 0‑1 s, alert > 10 s.

connected_slaves : Should equal replica count; alert if lower.

Sentinel response time : Normal < 5 ms, alert > 100 ms.

master_last_io_seconds_ago : Normal < 5 s, alert > 30 s.

repl_backlog_active : 1 = active, 0 = inactive.

Backup & Recovery

#!/bin/bash
SLAVE_HOST="10.0.1.11"
REDIS_PASS="Redis@Secure2024!"
BACKUP_DIR="/opt/redis/backups"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p "$BACKUP_DIR"
# Trigger BGSAVE on replica
redis-cli -h $SLAVE_HOST -p 6379 -a "$REDIS_PASS" --no-auth-warning BGSAVE
sleep 5
cp /var/lib/redis/dump.rdb "$BACKUP_DIR/dump_$DATE.rdb"
echo "Backup completed: $BACKUP_DIR/dump_$DATE.rdb"
# Keep last 7 days
find "$BACKUP_DIR" -name "dump_*.rdb" -mtime +7 -delete

Summary

Key Takeaways

SDOWN is a single Sentinel’s subjective view; ODOWN requires quorum and triggers failover.

Quorum of 2 in a 3‑node Sentinel cluster balances availability and false‑positive avoidance.

Split‑brain protection via min-replicas-to-write prevents data loss during network partitions.

Clients must obtain the master address from Sentinel; hard‑coded IPs break automatic failover.

Replica priority ( replica-priority) controls which replica becomes the new master.

Further Learning

Redis Cluster for horizontal scaling beyond a single node.

Combining Sentinel with proxy solutions (Twemproxy, Codis) to hide Sentinel logic from applications.

Deep dive into Redis persistence (AOF rewrite strategies, mixed RDB+AOF) and their impact on recovery time.

References

Redis Sentinel official documentation.

Redis 8.x release notes.

Redis high‑availability architecture design articles.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java monitoring Python High Availability Redis Configuration Sentinel failover

Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.