Understanding Consistency in Distributed Systems
This article explains the concept of consistency in distributed systems, distinguishes strong and weak (eventual) consistency, outlines typical use cases and challenges, and reviews key protocols such as 2‑Phase Commit, 3‑Phase Commit, Paxos, and Raft, while referencing the FLP and CAP theorems.
Consistency in a distributed system refers to the agreement among multiple nodes on a particular value, ensuring that all nodes see the same data at any given time.
There are two main types of consistency:
Strong consistency : All nodes hold identical data at all times; a read from any node returns the same value.
Weak (eventual) consistency : Nodes may temporarily diverge, but given enough time they converge to the same value.
Typical scenarios that require consistent distributed systems include multi‑node read/write services for high availability and scalability, such as ZooKeeper, DNS, and Redis clusters.
Distributed systems face several challenges:
Asynchronous messaging: network delays, loss, and lack of synchronous channels.
Node fail‑stop: nodes crash permanently.
Fail‑recovery: nodes crash and later recover (the most common case).
Network partitions: the network splits nodes into isolated groups.
Byzantine faults: nodes behave arbitrarily due to bugs or malicious attacks.
Designing a consistent system generally assumes a non‑Byzantine environment (trusted internal network).
The FLP theorem states that in an asynchronous system with only crash failures, it is impossible to guarantee both availability and strong consistency simultaneously. The CAP theorem similarly asserts that a system can provide at most two of the three properties: Consistency, Availability, and Partition tolerance.
Several protocols enforce consistency:
Two‑Phase Commit (2PC)
2PC is a distributed transaction protocol that ensures atomicity across multiple data shards. It involves a coordinator and participants, proceeding in two phases: (1) the coordinator asks participants to prepare, and (2) based on their votes, it either commits or aborts the transaction. Advantages: simple and easy to implement. Disadvantages: synchronous blocking, single‑point coordinator failure, possible data inconsistency, and reliance on timeouts.
Three‑Phase Commit (3PC)
3PC extends 2PC with an additional pre‑commit phase to reduce blocking. It consists of query, pre‑commit, and commit stages, using a watchdog timer and state logging to improve fault tolerance, though it still cannot guarantee consistency under all failures.
Paxos Algorithm
Paxos is a foundational consensus algorithm that solves the single‑point failure problem. It defines three roles: Proposer (suggests values), Acceptor (votes on proposals), and Learner (collects accepted proposals to determine the final value). The algorithm proceeds in two phases—prepare and accept—ensuring that any two majority quorums intersect, which guarantees safety.
Other related protocols include Raft and PacificA, which build on Paxos concepts to provide more understandable leader election and log replication mechanisms.
Architects Research Society
A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.