Fundamentals 7 min read

Understanding Replication, Consistency, Fault Tolerance, and the CAP Theorem in Distributed Systems

This article explains the core concepts of replication, consistency, and fault tolerance in distributed systems, discusses strong and asynchronous replication methods, and details the CAP theorem with its consistency, availability, and partition tolerance trade‑offs, illustrating AP and CP scenarios such as Eureka and Zookeeper clusters.

Big Data Technology & Architecture

Apr 13, 2020

Understanding Replication, Consistency, Fault Tolerance, and the CAP Theorem in Distributed Systems

In distributed systems, the primary concerns revolve around the CAP theorem, focusing on replication, consistency, and fault tolerance.

Replication ensures high availability and reliability by storing multiple copies of data; if one service instance fails, clients can invoke another replica, as illustrated by a service with two instances.

Consistency issues arise because multiple replicas may diverge. Typically one replica is the primary, while others are backups. Replication can be performed via two approaches:

Strong synchronous replication : write operations must be persisted to the primary and all backups before returning success, guaranteeing consistency but reducing performance and availability.

Asynchronous replication : the write returns success once the primary has stored the data, even if backups have not yet received it, offering better performance but risking data loss if the primary fails.

Fault tolerance is essential as larger clusters have higher error probabilities; distributed systems should automatically handle failures to maintain high availability.

CAP theorem (proposed by Eric Brewer) states that a system can only simultaneously guarantee two of the three properties: Consistency, Availability, and Partition tolerance.

Consistency: reads always return the most recent write.

Availability: writes succeed even if some replicas are down, implying weaker consistency.

Partition tolerance: the system continues to operate despite network partitions.

Consequently, system designers must choose between consistency and availability based on requirements.

CAP usage scenarios

AP mode

Eureka service registry clusters replicate service metadata; if a replica fails before replication completes, data inconsistency can occur.

MySQL and Redis clusters often use asynchronous replication, leading to potential inconsistencies during reads.

CP mode

Zookeeper clusters employ a leader‑follower model where writes go to the leader and are broadcast to followers; only after all followers confirm does the client receive success, ensuring consistency.

Two‑phase commit in databases ensures atomicity across multiple databases by requiring a pre‑commit phase followed by a commit phase.

Kafka with acks=all waits for all replica acknowledgments before confirming a write, guaranteeing consistency for consumers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CAP theorem fault tolerance Consistency

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.