Data Consistency Strategies for Big Data Applications: Simple Replication, HDFS Pipeline, and Elasticsearch
The article explains three approaches to ensuring data consistency in big‑data systems—basic multi‑node replication, HDFS pipeline replication, and Elasticsearch primary‑replica replication—detailing their workflows, advantages, and drawbacks.
When developing big‑data applications, data consistency and high availability are critical, so redundant replicas are used; however, maintaining consistency among these replicas is a key challenge. This article summarizes three consistency strategies and illustrates the architectures employed by HDFS and Elasticsearch.
1. Simple multi‑node replication
The request is dispatched to multiple nodes, each node writes the data and replies; once a predefined number of nodes have successfully written, the write is considered successful.
Advantages: Write latency is determined by the slowest node.
Disadvantages: High network I/O because the client must send data to every node.
2. HDFS replica write consistency
HDFS uses a chained pipeline: the client writes to the first DataNode, which forwards the data to the next DataNode, and so on, forming a pipeline that propagates the write down the chain and then acknowledges back up to the client.
Advantages: Guarantees strong consistency.
Disadvantages: All replicas must successfully write before the operation is considered successful, leading to lower throughput.
3. Elasticsearch replica write consistency
The write is first sent to the primary node; after the primary succeeds, it forwards the request to all replica nodes. The primary waits for acknowledgments from all replicas before responding to the client.
Advantages: Good performance; write latency equals primary write time plus the maximum replica write time.
Disadvantages: Relies on the primary node; large data volumes can stress network I/O.
Welcome to like, bookmark, and share the post!
Enjoyed the article? Click "Read Again" below! 👇
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
