Master‑Slave Replication Pitfalls and Deep Dive into Redis Sentinel
This article examines the limitations of Redis master‑slave replication, such as manual failover and single‑node bottlenecks, and provides an in‑depth exploration of Redis Sentinel’s architecture, configuration parameters, detection mechanisms, automatic failover process, and best‑practice recommendations for achieving high availability.
Redis Master‑Slave Replication Limitations
Redis master‑slave replication copies data from a single master node to one or more slave nodes. The slaves provide (1) a hot‑standby that can be promoted when the master fails and (2) read‑scale by offloading read traffic from the master.
Failover requires manual promotion of a slave and reconfiguration of client applications.
The master’s write throughput is bounded by a single machine.
The master’s storage capacity is limited to the resources of a single node.
In early Redis versions a failed PSYNC triggers a full‑sync; during the full backup the master may pause for milliseconds to seconds.
Redis Sentinel Overview
Sentinel adds automatic monitoring, notification and failover to achieve high availability for Redis clusters.
Sentinel Architecture
Monitoring – Sentinel continuously pings masters and slaves to verify they are alive. Notification – When a problem is detected Sentinel can invoke a script or API to alert operators. Automatic failover – If a master is deemed down, Sentinel promotes a slave to master and reconfigures the remaining slaves.
Essential sentinel.conf Directives
sentinel monitor mymaster 192.168.10.202 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 2
sentinel can-failover mymaster yes
sentinel auth-pass mymaster 20180408
sentinel failover-timeout mymaster 180000
sentinel config-epoch mymaster 0
sentinel notification-script mymaster /var/redis/notify.sh
sentinel leader-epoch mymaster 0 sentinel monitor– defines the master name, IP, port and the quorum required for a failover. sentinel down-after-milliseconds – time after which a master is considered subjectively down (SDOWN) if no valid reply is received. sentinel parallel-syncs – maximum number of slaves that may synchronize with the new master simultaneously during failover. sentinel can-failover – enables or disables automatic failover for the specified master. sentinel auth-pass – password used by Sentinel to authenticate with the master and its slaves. sentinel failover-timeout – maximum duration for a failover attempt before it is considered failed. sentinel config-epoch – controls how many slaves may sync with the new master at once; a lower value lengthens failover time. sentinel notification-script – script executed when a failover occurs (e.g., to send alerts). sentinel leader-epoch – epoch used for leader election; keeping it low avoids excessive configuration churn.
Subjective vs. Objective Down
Subjective down (SDOWN) – a single Sentinel’s judgment that a server is unreachable. Objective down (ODOWN) – consensus among a quorum of Sentinels that the same server is down.
Sentinel Work Flow
Each Sentinel sends a PING to known masters, slaves and other Sentinels once per second.
If a server does not reply within down-after-milliseconds, it is marked SDOWN.
All Sentinels monitoring that master confirm the SDOWN state.
When the configured quorum agrees, the master is marked ODOWN.
Sentinels increase the INFO‑command frequency for the downed master’s slaves from every 10 seconds to every second.
If the master remains SDOWN, Sentinels hold an election; the elected leader selects a slave and promotes it to master using SLAVEOF NO ONE, then publishes the new configuration via Pub/Sub.
Remaining slaves are instructed to replicate from the new master with SLAVEOF commands.
When all slaves have started replication, the leader Sentinel ends the failover process.
Automatic Discovery of Sentinels and Slaves
Sentinels use Redis’s publish/subscribe mechanism to broadcast their IP, port and run‑id. Other Sentinels listening on the same channel automatically add newly discovered Sentinels to their monitoring list, eliminating the need for static configuration.
Failover Procedure
Detect that the master has entered ODOWN. Increment the current epoch and attempt to become the leader. If election fails, retry after twice the failover-timeout . Once elected, select a slave and promote it with SLAVEOF NO ONE . Publish the new configuration to all Sentinels via Pub/Sub. Instruct the former master’s slaves to replicate from the new master using SLAVEOF . When every slave has begun replication, the leading Sentinel terminates the failover.
References: https://redis.io/,
https://www.cnblogs.com/bingshu/p/9776610.htmlSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
