Databases 22 min read

Redis High Concurrency, High Availability, and Sentinel Deep Dive

This article explains how Redis can be scaled for high concurrency using master‑slave architecture, clustering, and read‑write separation, and it details the mechanisms of replication, persistence, Sentinel monitoring, failover, and configuration that together achieve high availability and data safety.

Big Data Technology & Architecture

Jul 21, 2019

Redis High Concurrency, High Availability, and Sentinel Deep Dive

If you use Redis for caching, you must consider how to scale Redis across multiple machines, ensure high concurrency, and prevent a single point of failure.

Redis High Concurrency: A master‑slave setup (one master, many slaves) can handle tens of thousands of QPS on a single master and up to 100k QPS across multiple slaves. For larger data volumes (tens to hundreds of gigabytes or terabytes), a Redis cluster is required, providing hundreds of thousands of QPS.

Redis High Availability: Adding Sentinel to a master‑slave deployment enables automatic failover when any instance crashes.

1. How Redis achieves >100k QPS through read‑write separation: By using a master for writes and multiple slaves for reads, the system offloads most read traffic from the database, allowing the cache layer to support high request rates.

2. Bottlenecks of single‑node Redis: A single Redis instance has an upper limit on QPS due to hardware constraints.

3. Scaling concurrency with read‑write separation: Configure one master for writes and several slaves for reads; slaves can be added to increase overall throughput.

2. Redis Replication and Master Persistence for Safety:

1. Replication principle: A master writes data, then asynchronously syncs it to all slaves, ensuring data consistency.

2. Core mechanisms: (1) Asynchronous replication with periodic acknowledgments (since Redis 2.8). (2) Multiple slave nodes per master. (3) Slave‑to‑slave connections. (4) Replication does not block the master. (5) Slaves serve old data during replication and pause when loading new data. (6) Horizontal scaling via slaves improves throughput.

3. Importance of master persistence: Enable persistence on the master; do not rely on a slave as a hot backup because loss of the master would cause data loss for all slaves.

Backup files should be stored in multiple cold locations to avoid loss if the machine fails.

3. Replication details, resumable sync, disk‑less replication, and key expiration handling:

1. Replication flow: Slave sends PSYNC to master; if reconnecting, master sends only missing data; otherwise a full resynchronization occurs, generating an RDB snapshot and buffering write commands.

2. Resumable sync (since Redis 2.8): A backlog stores replica offsets; after a network break, the master resumes from the last offset.

3. Disk‑less replication: Master creates the RDB file in memory and streams it to slaves without writing to disk (controlled by repl-diskless-sync and repl-diskless-sync-delay).

4. Expired key handling: Slaves do not expire keys themselves; they rely on the master to delete keys and propagate DEL commands.

4. Full replication process: Master performs BGSAVE, sends the RDB file, then streams buffered write commands; client‑output‑buffer‑limit for slaves controls memory usage during replication.

5. Incremental replication: If a full sync is interrupted, the master uses the backlog to send only missing data based on the replica offset.

6. Heartbeat: Master sends a heartbeat every 10 seconds; slaves send one every second.

7. Asynchronous replication: Master writes data locally then asynchronously replicates to slaves.

5. Achieving 99.99% high availability with Redis:

High availability (HA) means the system can operate without interruption; it is measured by MTBF, MDT, and availability percentage. A 99.99% SLA corresponds to about 52 minutes of downtime per year.

Redis can become unavailable due to master failure or single‑instance crashes, which can cascade to database overload.

To achieve HA, each Redis instance must have backups and a rapid failover mechanism, typically provided by Sentinel.

6. Sentinel basics:

Sentinel monitors masters and slaves, sends alerts, performs automatic failover, and updates client configurations.

Sentinel clusters need at least three instances to form a quorum; a majority of Sentinels must agree on a master failure before a failover is executed.

7. Data loss scenarios in Sentinel failover:

Asynchronous replication can cause loss of writes that have not yet been replicated; network partitions (split‑brain) can create two masters, leading to potential data loss.

Configuring min-slaves-to-write and min-slaves-max-lag limits writes when insufficient slaves are in sync, reducing loss to at most the lag period (e.g., 10 seconds).

8. Sentinel internal mechanisms:

1. sdown vs. odown: sdown is a subjective down state reported by a single Sentinel; odown is an objective down state after a quorum of Sentinels agree.

2. Discovery: Sentinels use the __sentinel__:hello Pub/Sub channel to announce themselves and exchange monitoring configurations.

3. Slave self‑correction: Sentinels ensure slaves replicate the correct master and adjust configurations after failover.

4. Election algorithm: When a master is odown, Sentinels select a slave to promote based on disconnection time, priority, replica offset, and run‑id.

5. Quorum and majority: Failover requires a quorum of Sentinels to deem the master odown and a majority to authorize the promotion.

6. Configuration epoch: Each successful failover generates a unique configuration epoch (version) that is propagated to other Sentinels.

7. Configuration propagation: After failover, the new master configuration is broadcast via Pub/Sub so all Sentinels update their view.

— THE END —

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

High Availability Redis replication Sentinel

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.