Fundamentals 8 min read

How Distributed Consensus Overcomes the FLP Impossibility Theorem

This article explores how to build fault‑tolerant distributed systems by formalizing consensus, outlines its core properties, explains the FLP impossibility theorem, and shows how algorithms like Raft sidestep its limits through timing constraints and recovery mechanisms.

Xiaokun's Architecture Exploration Notes

May 18, 2025

How Distributed Consensus Overcomes the FLP Impossibility Theorem

In the previous article we discussed that building fault‑tolerant distributed systems requires both safety and liveness properties. This piece examines the best industry practices for implementing such systems.

Understanding Distributed Consensus

Distributed systems can encounter many problems:

Service nodes may crash.

Asynchronous networks can lose, reorder, duplicate messages, or exhibit unbounded delays.

There is no globally synchronized clock, affecting the accuracy of distributed nodes.

Processes may pause due to garbage collection, safepoint, or fsync operations.

The optimal way to build a fault‑tolerant system is to abstract common guarantees, formalize them, implement them once, and let applications depend on these guarantees—essentially applying the DRY principle.

Consensus is the abstract expression of these guarantees: it means multiple nodes reach agreement on a single decision while tolerating faults.

Formally, one or more nodes submit proposal requests to the system; the consensus algorithm decides which proposal to adopt. The diagram below illustrates this process.

Consensus possesses the following properties:

Uniform Agreement : All non‑faulty nodes output the same proposal result.

Integrity : Non‑faulty nodes cannot decide twice.

Validity : If a node proposes value v2 , that value must originate from a node that actually made a proposal.

Termination : All non‑faulty nodes eventually reach a decision, emphasizing timeliness.

Uniform Agreement and Integrity define the core of a consensus algorithm. Validity excludes meaningless decisions (e.g., empty or external proposals). Termination corresponds to the liveness property, allowing the system to tolerate faults for a bounded time T and recover afterward.

How Consensus Algorithms Avoid the FLP Impossibility

The FLP theorem, proved by Fischer, Lynch, and Paterson in 1985, states that in an asynchronous system with one or more possible crash failures, no deterministic consensus algorithm can guarantee both safety and liveness.

The FLP theorem, named after Michael Fischer, Nancy Lynch, and Mike Paterson, states that in an asynchronous system with one or more potential crash failures, no deterministic consensus algorithm can guarantee both safety and liveness.

This theorem highlights three constraints: asynchronous systems, crash‑faulty nodes, and deterministic algorithms. Consequently, a purely asynchronous fault‑tolerant system cannot simultaneously achieve safety and liveness.

Real‑world systems are typically partially synchronous with node‑recovery mechanisms. They employ timeout strategies and automatic node restarts or replacements, effectively weakening the strict FLP constraints and converting unbounded waiting into a bounded, timely problem. This is how algorithms such as Raft can achieve consensus despite the FLP impossibility.

Summary

Consensus abstracts the universal safeguards needed for fault‑tolerant distributed systems. By imposing timeliness constraints on a partially synchronous foundation, we can design high‑availability architectures that tolerate non‑Byzantine faults (primarily network and node failures). In such designs, consensus algorithms provide the consistency required for reliable service delivery.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems High Availability fault tolerance Raft Consensus FLP theorem

Written by

Xiaokun's Architecture Exploration Notes

10 years of backend architecture design | AI engineering infrastructure, storage architecture design, and performance optimization | Former senior developer at NetEase, Douyu, Inke, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.