How to Prevent Data Loss in Redis Clusters: Asynchronous Replication & Split‑Brain Scenarios
This article explains why Redis can lose data during asynchronous master‑slave replication failures or split‑brain partitions, and shows how configuring min‑slaves‑to‑write and min‑slaves‑max‑lag, along with client‑side fallback strategies, can greatly reduce the risk of loss.
1. Data loss scenarios
Asynchronous replication loss
Cluster split‑brain data loss
1. Asynchronous replication loss
Redis master‑to‑slave replication is asynchronous; the client receives OK after writing to the master, and the data is later propagated to slaves.
If the master crashes before syncing to slaves, the data residing only in the master’s memory is lost.
Even with persistence enabled, after a master failure Sentinel elects a new master; if the old master restarts, it must sync from the new master, which may have an empty dataset, causing the old data to be overwritten and lost.
2. Cluster split‑brain
A split‑brain occurs when network partitions separate masters from slaves. Sentinel may consider the master unavailable and promote a slave to master.
If the original master is still alive, clients may continue writing to it while the new master has no data, leading to divergent datasets. When the old master later becomes a slave and syncs, its data is overwritten, resulting in massive loss.
2. How to minimize data loss?
Two Redis configuration parameters can be tuned:
min-slaves-to-write 1
min-slaves-max-lag 10By default, min-slaves-to-write is 0 and min-slaves-max-lag is 10 seconds. Setting min-slaves-to-write to 1 and reducing min-slaves-max-lag forces the master to reject writes when the required number of slaves are not synchronized within the specified lag, preventing large‑scale data loss during failures.
Clients can also degrade gracefully by temporarily storing writes in a local cache, disk, or a Kafka queue and replaying them to the master once it recovers.
The exact parameter values should be tested in the specific environment to achieve the best trade‑off between availability and data safety.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
