Fundamentals of Distributed Systems: Consensus, 2PC/3PC, CAP Theorem, and Logical Clocks
This article introduces core distributed‑system concepts—including the definition of consensus, the two‑phase and three‑phase commit protocols, the CAP theorem and its engineering implications, and logical‑clock mechanisms such as Lamport timestamps, vector clocks, and version vectors—explaining their models, challenges, and practical trade‑offs.
Fundamentals of Distributed Systems – Consensus, 2PC and 3PC
Distributed systems consist of network‑connected nodes that must reach agreement on operations such as transaction commit, leader election, or sequence number generation. Consensus requires three properties: agreement, validity, and termination, and must cope with asynchronous messaging, node failures, network partitions, and Byzantine faults.
Two‑phase commit (2PC) separates the decision into a proposal phase where a coordinator gathers votes from participants, and a commit/abort phase based on the collected votes. In asynchronous environments without node failures, 2PC satisfies the consensus properties, but it can block if the coordinator crashes, requiring a watchdog and logging to recover.
Three‑phase commit (3PC) adds a "prepare to commit" phase to avoid blocking: after participants acknowledge the prepare step, the coordinator can safely commit or abort even if some nodes fail. This adds an extra round‑trip latency but improves availability under fail‑recover and partition scenarios.
CAP Theorem
The CAP theorem states that a distributed service cannot simultaneously guarantee Consistency, Availability, and Partition‑tolerance; at most two can be achieved. Consistency means all reads see the latest successful write, availability requires every request to terminate, and partition‑tolerance demands operation despite network splits.
In practice, engineers must choose trade‑offs: CP systems sacrifice availability during partitions, AP systems sacrifice strong consistency, and CA systems assume no partitions. Extensions such as PACELC incorporate latency (L) as a fourth dimension, guiding design choices between consistency and latency when partitions are absent.
Logical Clocks and Event Ordering
Physical clocks cannot reliably order events across nodes due to clock drift and network delay, so distributed systems use logical clocks. Lamport timestamps assign a monotonically increasing counter to each event, establishing a partial order (happened‑before) and, with tie‑breaking rules, a total order.
Vector clocks extend Lamport timestamps by maintaining a vector of counters for each node, enabling detection of concurrent events. Version vectors build on vector clocks to detect data conflicts in replicated storage, though they can grow large and require pruning or server‑centric identifiers.
These logical‑clock mechanisms underpin many consistency protocols and conflict‑resolution strategies in modern distributed databases and key‑value stores.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.