Databases 12 min read

Mastering Redis Replication and Sentinel: Solving Failover Challenges

This article examines the limitations of Redis master‑slave replication, explains how Redis Sentinel addresses those issues with monitoring, notification, and automatic failover, and provides detailed configuration commands, discovery mechanisms, and step‑by‑step failover procedures for building a highly available Redis deployment.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Mastering Redis Replication and Sentinel: Solving Failover Challenges

Problems with Redis Master‑Slave Replication

When the master crashes, a replica must be promoted manually and all client applications need to update the master address.

The master’s write throughput is limited to a single instance.

The master’s storage capacity is bounded by a single machine.

Early‑version native replication may trigger a full‑sync pause: if PSYNC fails, Redis falls back to a full resynchronization, causing the master to perform a full backup that can stall for milliseconds to seconds.

Redis Sentinel Overview

Core Functions

Sentinel provides three essential capabilities for high‑availability Redis clusters:

Monitoring : continuously pings masters, replicas and other Sentinels to detect failures.

Notification : emits events (via Pub/Sub or external scripts) when a server is considered down.

Automatic failover : promotes a suitable replica to master and re‑configures the remaining replicas without manual intervention.

Minimal sentinel.conf Example

sentinel monitor mymaster 192.168.10.202 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 2
sentinel can-failover mymaster yes
sentinel auth-pass mymaster 20180408
sentinel failover-timeout mymaster 180000
sentinel config-epoch mymaster 0
sentinel notification-script mymaster /var/redis/notify.sh
sentinel leader-epoch mymaster 0

Explanation of each directive: sentinel monitor – registers the master named mymaster at IP 192.168.10.202, port 6379. The last parameter ( 2) is the quorum: at least two Sentinels must agree before a failover is triggered. sentinel down-after-milliseconds – marks the master as subjectively down (SDOWN) if no valid reply is received for 30 000 ms. sentinel parallel-syncs – limits the number of replicas that may sync from the new master simultaneously during failover. Setting it to 1 ensures only one replica is unavailable at a time. sentinel can-failover – enables automatic failover for the specified master. sentinel auth-pass – password used by Sentinel to authenticate with the master and its replicas (must match requirepass in redis.conf). sentinel failover-timeout – maximum time (in ms) a failover may take before being considered failed (default 180 000 ms). sentinel config-epoch – controls how many replicas may concurrently sync with the new master; a lower value lengthens the failover but reduces load. sentinel notification-script – optional script executed during failover (timeout 60 s). sentinel leader-epoch – tracks the epoch of the elected leader Sentinel; a higher value prevents split‑brain scenarios.

Subjective vs. Objective Down

Subjective down (SDOWN) : a single Sentinel decides that a server is unreachable.

Objective down (ODOWN) : a quorum of Sentinels agree on the same server’s failure, triggering the failover process.

Sentinel Working Principle

Each Sentinel sends a PING to all known masters, replicas and other Sentinels once per second.

If a server does not reply within the configured timeout, the Sentinel marks it as SDOWN.

All Sentinels monitoring that master confirm the SDOWN status.

When the configured quorum agrees, the master is marked ODOWN.

Sentinels increase the INFO‑command frequency for the affected master and its replicas from every 10 seconds to every second.

Sentinels negotiate a new master: if the current master is SDOWN, they vote to select a replica.

The elected replica receives SLAVEOF NO ONE and becomes the new master.

The updated configuration is propagated to all Sentinels via Pub/Sub.

All remaining replicas are instructed to execute SLAVEOF <new‑master‑IP> <port> and start syncing.

When every replica has begun syncing, the leading Sentinel ends the failover.

Automatic Discovery of Sentinels and Replicas

Sentinels use Redis’s Pub/Sub channels to broadcast their IP, port and run‑id. Each Sentinel subscribes to the channels of the masters and replicas it monitors, automatically adding newly discovered Sentinels to its internal list after deduplication. This eliminates the need to statically configure the addresses of all Sentinels.

Failover Process

Detect that the master is ODOWN.

Increment the current configuration epoch and attempt to become the leader Sentinel.

If election fails, retry after twice the failover-timeout.

Select a suitable replica and promote it to master.

Send SLAVEOF NO ONE to the chosen replica.

Publish the new configuration to all other Sentinels via Pub/Sub.

Instruct every remaining replica to execute SLAVEOF <new‑master‑IP> <port> and start syncing.

When all replicas are syncing, the leading Sentinel terminates the failover.

References

https://redis.io/

https://www.cnblogs.com/bingshu/p/9776610.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databasehigh availabilityredisConfigurationReplicationsentinelfailover
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.