Redis Sentinel Deep Dive: High‑Availability Architecture & Automatic Failover
This article explains Redis Sentinel’s role as the official high‑availability solution, detailing its monitoring, notification, automatic failover mechanisms, discovery processes, connection types, down‑state classifications, failover steps, leader election, master selection rules, and data consistency guarantees.
What is Sentinel
Redis Sentinel is the official high‑availability (HA) solution recommended by Redis. When using a master‑slave setup, if the master fails, Redis itself does not automatically switch roles, but Sentinel runs as an independent process that monitors multiple master‑slave clusters and can perform automatic failover.
Sentinel Architecture
Sentinel Functions
1) Monitoring : Sentinel continuously checks whether your master and slave servers are operating correctly.
2) Notification : When a monitored Redis server encounters a problem, Sentinel can send notifications to administrators or other applications via its API.
3) Automatic Failover : If a master server stops working, Sentinel initiates an automatic failover, promoting one of its slaves to become the new master and reconfiguring the remaining slaves to replicate from the new master. Clients attempting to connect to the failed master are redirected to the new master.
Sentinel Discovery and Connections
Sentinel discovers master servers using a user‑provided configuration file.
Sentinel creates two network connections to each monitored master:
Command connection for sending commands to the master. Subscription connection for subscribing to a channel to discover other Sentinels monitoring the same master.
How Sentinel Discovers Other Slaves
Sentinel sends the INFO command to the master to automatically obtain the addresses of all slaves.
For each discovered slave, Sentinel creates a command connection and a subscription connection, similar to the master.
How Sentinel Discovers Other Sentinels
Sentinel announces its presence by sending a “HELLO” message (containing its IP, port, and ID) over the command connection to monitored masters and slaves. It also receives “HELLO” messages from other Sentinels via the subscription connection, thereby discovering peers monitoring the same master.
Key points:
A Sentinel can connect to multiple other Sentinels, checking each other's availability and exchanging information.
Sentinels automatically discover peers via the __sentinel__:hello channel without manual configuration.
Each Sentinel publishes its own state (IP, port, run‑id) every two seconds.
Connections Between Sentinels
Sentinels only create command connections with each other for communication; subscription connections are not needed because the master‑slave servers act as intermediaries for HELLO messages.
Subjectively Down (SDOWN): a single Sentinel’s judgment that a server is down.
Objectively Down (ODOWN): multiple Sentinels have marked the server SDOWN and have exchanged SENTINEL is‑master‑down‑by‑addr messages, confirming the down state.
Sentinel Down Detection
Sentinel uses PING to check server status. If a server fails to reply within the configured master‑down‑after‑milliseconds interval, it is marked SDOWN. Valid PING replies are +PONG, -LOADING, or -MASTERDOWN. Any other reply or no reply is considered non‑valid.
Only when a server continuously returns non‑valid replies for the entire master‑down‑after‑milliseconds period is it marked SDOWN.
Sentinel Failover Procedure
Detect that the master has entered ODOWN.
Conduct a leader election based on the Raft protocol.
If election fails, retry after twice the failover timeout.
Select a slave and promote it to master.
Send SLAVEOF NO ONE to the promoted slave.
Publish the new configuration to all other Sentinels via Pub/Sub.
Send SLAVEOF to the former master’s slaves so they replicate from the new master.
When all slaves start replicating the new master, the leader Sentinel ends the failover process.
After reconfiguration, Sentinel sends a CONFIG REWRITE command to the affected instance to persist the new settings.
Sentinel Master Selection Rules
Sentinel selects a new master using the following criteria:
Discard slaves that are SDOWN, disconnected, or whose last PING reply is older than five seconds.
Discard slaves whose connection to the failed master has been down for longer than ten times the down‑after option.
From the remaining slaves, choose the one with the highest replication offset; if offsets are equal or unavailable, select the slave with the smallest run‑id.
Sentinel Data Consistency
Sentinel’s automatic failover uses the Raft algorithm to elect a leader, ensuring that only one leader exists per epoch. This prevents multiple leaders from being elected simultaneously. Configuration updates follow a “last‑write‑wins” rule, so the newest configuration propagates to all Sentinels.
During network partitions, a Sentinel with an older configuration will update itself when it receives a newer version from peers. To maintain consistency under partitions, set min‑slaves‑to‑write so the master stops writes when the number of connected slaves falls below a threshold, and run Sentinel on every Redis node.
Sentinel Practical Usage
Environment Preparation
# 1. Create Sentinel configuration directory
mkdir -p /etc/redis-sentinel/26380
# 2. Create configuration file
vim /etc/redis-sentinel/26380/sentinel.conf
port 26380
dir "/etc/redis-sentinel/26380"
sentinel monitor zls 172.16.1.52 6379 1
sentinel down-after-milliseconds zls 5000
# 3. Start Sentinel
redis-sentinel /etc/redis-sentinel/26380/sentinel.conf &Sentinel Related Commands
# Check master liveness
127.0.0.1:26380> ping
PONG
# List monitored masters
127.0.0.1:26380> SENTINEL MASTERS
# List slaves of a master
127.0.0.1:26380> SENTINEL slaves zls
# Get master address by name
127.0.0.1:26380> SENTINEL get-master-addr-by-name zls
1) "172.16.1.51"
2) "6379"
# Manual failover
127.0.0.1:26380> SENTINEL FAILOVER zls
# Reset master configuration
127.0.0.1:26380> SENTINEL reset zlsSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
