Fundamentals 24 min read

Consistency, Consensus, and Reliability in Distributed Systems

This article explains the core challenges of achieving consistency in distributed systems, describes consensus algorithms such as Paxos and Raft, discusses theoretical limits like the FLP impossibility and CAP theorem, and shows how trade‑offs among consistency, availability, and partition tolerance shape practical system design.

Architecture Digest
Architecture Digest
Architecture Digest
Consistency, Consensus, and Reliability in Distributed Systems

Consistency Issues

In distributed systems, consistency means that multiple service nodes reach a common state for a series of operations, usually ensured by a consensus algorithm.

An example of two cinemas selling the same tickets illustrates the need for coordinated decisions to avoid overselling.

Note: Consistency does not guarantee correctness; all nodes may agree on a failure state.

Challenges

Unreliable network communication (delays, faults).

Node failures or crashes.

Synchronous calls hinder scalability.

Simple coordination ideas—phone checks, time‑slot selling, third‑party ticket pool—illustrate the principle of serialising conflicting operations.

Requirements

Termination – a consistent result must be reached in finite time.

Consensus – all nodes eventually decide on the same value.

Validity – the decision must be one of the proposals submitted.

Constrained Consistency

Strong consistency is expensive; weaker models such as eventual consistency are often sufficient for web services.

Consensus Algorithms

Consensus algorithms achieve agreement on a proposal among nodes. They must handle non‑Byzantine faults (crash, omission) and Byzantine faults (malicious behavior).

Problem Challenges

Real‑world systems face network partitions, node crashes, and malicious nodes, making consensus non‑trivial.

Common Algorithms

For non‑Byzantine faults: Paxos, Raft and variants. For Byzantine faults: PBFT, Proof‑of‑Work, etc.

Theoretical Limits

The FLP impossibility theorem states that in an asynchronous system with even a single crash fault, no deterministic algorithm can guarantee consensus.

FLP Impossibility Principle

Proved by Fischer, Lynch, and Paterson (1985), it shows that designing a universally solving consensus algorithm for asynchronous systems is futile.

CAP Theorem

In a distributed system you cannot simultaneously guarantee Consistency, Availability, and Partition tolerance; you must sacrifice at least one.

ACID Principles

Atomicity, Consistency, Isolation, Durability describe strict database guarantees, often relaxed to BASE (Basic Availability, Soft state, Eventual consistency) for scalability.

Paxos and Raft

Paxos (Lamport 1990) uses a two‑phase commit with proposer, acceptor, learner roles to reach consensus; Raft simplifies Paxos with leader election and log replication.

Byzantine Fault Tolerance

PBFT (Castro & Liskov 1999) tolerates up to f faulty nodes in a system of 3f+1 nodes, using pre‑prepare, prepare, and commit phases.

Reliability Metrics

Service reliability is often expressed as “nines” (e.g., 99.99% uptime); higher nines require more redundancy and cost.

Conclusion

Distributed systems are a core area of computer science; achieving consistency involves trade‑offs among performance, availability, and fault tolerance.

Distributed SystemsCAP theoremconsistencyRaftBlockchainPaxosconsensus algorithmsFLP impossibility
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.