Why Redis Cluster Can Lose Data and How to Mitigate It
Redis Cluster does not guarantee strong consistency, and in scenarios like asynchronous replication or network partitions data can be lost even after client acknowledgment; using the WAIT command, configuring node timeout, and understanding master‑slave election can reduce but not fully eliminate these risks.
Redis Cluster does not guarantee strong consistency; in certain special scenarios, even if the client receives a write acknowledgment, data may still be lost.
Scenario 1: Asynchronous Replication
client writes to master B
master B replies OK
master B synchronizes to slaves B1, B2, B3
Master B replies to the client without waiting for confirmations from B1, B2, B3. If the master crashes before the slaves finish syncing, one of the slaves may be elected master and the previously written data is lost.
The wait command can improve data safety in this scenario. wait blocks the current client until the previous write operation has been successfully replicated to a specified number of slaves.
Using wait can increase safety but does not guarantee strong consistency, because a slave that has not yet completed synchronization might still be elected master.
Scenario 2: Network Partition
Six nodes A, B, C, A1, B1, C1 (three masters and three slaves) and a client Z1.
After a network partition, two zones are formed: A, C, A1, B1, C1 and B Z1.
Client Z1 can still write to B. If the partition is short-lived, the cluster resumes normal operation. If the partition persists, B1 becomes the master in its partition, and the data written by Z1 to B is lost.
The maximum window (maximum time window) can reduce data loss by limiting the total number of writes from Z1 to B.
After a certain period, the majority side of the partition will hold an election, a slave becomes master, and the minority side's master will refuse to accept write requests.
This time amount is very important and is called the node expiration time .
When a master reaches the expiration time, it is considered faulty, enters an error state, stops receiving write requests, and can be replaced by a slave.
Summary
Redis Cluster does not guarantee strong consistency and has data‑loss scenarios:
Asynchronous replication – the master writes successfully, but before slaves finish syncing, the master crashes, a slave becomes master, and data is lost. The wait command can switch to synchronous replication, but it cannot fully guarantee no data loss and impacts performance.
Network partition – after a partition, a master continues to accept writes; when the partition heals, that master may become a slave, causing previously written data to be lost. Setting a node expiration time can limit the amount of writes a master accepts during a partition, reducing data‑loss impact.
Recommended reading:
Redis 并发竞争key问题如何解决?
分布式限流
Redis 5 有序集合新增命令
Redis Stream 实践
Redis 应用案例 - 在问题中不断成长
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
