Databases 11 min read

Master‑Slave Replication Pitfalls and Deep Dive into Redis Sentinel

This article examines the limitations of Redis master‑slave replication, such as manual failover and single‑node bottlenecks, and provides an in‑depth exploration of Redis Sentinel’s architecture, configuration parameters, detection mechanisms, automatic failover process, and best‑practice recommendations for achieving high availability.

IT Architects Alliance

Jun 20, 2021

Master‑Slave Replication Pitfalls and Deep Dive into Redis Sentinel

Redis Master‑Slave Replication Limitations

Redis master‑slave replication copies data from a single master node to one or more slave nodes. The slaves provide (1) a hot‑standby that can be promoted when the master fails and (2) read‑scale by offloading read traffic from the master.

Failover requires manual promotion of a slave and reconfiguration of client applications.

The master’s write throughput is bounded by a single machine.

The master’s storage capacity is limited to the resources of a single node.

In early Redis versions a failed PSYNC triggers a full‑sync; during the full backup the master may pause for milliseconds to seconds.

Redis Sentinel Overview

Sentinel adds automatic monitoring, notification and failover to achieve high availability for Redis clusters.

Sentinel Architecture

Monitoring – Sentinel continuously pings masters and slaves to verify they are alive. Notification – When a problem is detected Sentinel can invoke a script or API to alert operators. Automatic failover – If a master is deemed down, Sentinel promotes a slave to master and reconfigures the remaining slaves.

Essential sentinel.conf Directives

sentinel monitor mymaster 192.168.10.202 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 2
sentinel can-failover mymaster yes
sentinel auth-pass mymaster 20180408
sentinel failover-timeout mymaster 180000
sentinel config-epoch mymaster 0
sentinel notification-script mymaster /var/redis/notify.sh
sentinel leader-epoch mymaster 0

sentinel monitor

– defines the master name, IP, port and the quorum required for a failover. sentinel down-after-milliseconds – time after which a master is considered subjectively down (SDOWN) if no valid reply is received. sentinel parallel-syncs – maximum number of slaves that may synchronize with the new master simultaneously during failover. sentinel can-failover – enables or disables automatic failover for the specified master. sentinel auth-pass – password used by Sentinel to authenticate with the master and its slaves. sentinel failover-timeout – maximum duration for a failover attempt before it is considered failed. sentinel config-epoch – controls how many slaves may sync with the new master at once; a lower value lengthens failover time. sentinel notification-script – script executed when a failover occurs (e.g., to send alerts). sentinel leader-epoch – epoch used for leader election; keeping it low avoids excessive configuration churn.

Subjective vs. Objective Down

Subjective down (SDOWN) – a single Sentinel’s judgment that a server is unreachable. Objective down (ODOWN) – consensus among a quorum of Sentinels that the same server is down.

Sentinel Work Flow

Each Sentinel sends a PING to known masters, slaves and other Sentinels once per second.

If a server does not reply within down-after-milliseconds, it is marked SDOWN.

All Sentinels monitoring that master confirm the SDOWN state.

When the configured quorum agrees, the master is marked ODOWN.

Sentinels increase the INFO‑command frequency for the downed master’s slaves from every 10 seconds to every second.

If the master remains SDOWN, Sentinels hold an election; the elected leader selects a slave and promotes it to master using SLAVEOF NO ONE, then publishes the new configuration via Pub/Sub.

Remaining slaves are instructed to replicate from the new master with SLAVEOF commands.

When all slaves have started replication, the leader Sentinel ends the failover process.

Automatic Discovery of Sentinels and Slaves

Sentinels use Redis’s publish/subscribe mechanism to broadcast their IP, port and run‑id. Other Sentinels listening on the same channel automatically add newly discovered Sentinels to their monitoring list, eliminating the need for static configuration.

Failover Procedure

Detect that the master has entered ODOWN. Increment the current epoch and attempt to become the leader. If election fails, retry after twice the failover-timeout . Once elected, select a slave and promote it with SLAVEOF NO ONE . Publish the new configuration to all Sentinels via Pub/Sub. Instruct the former master’s slaves to replicate from the new master using SLAVEOF . When every slave has begun replication, the leading Sentinel terminates the failover.

References: https://redis.io/,

https://www.cnblogs.com/bingshu/p/9776610.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database High Availability Redis replication Sentinel failover

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.