Fundamentals 14 min read

Understanding the Chandy‑Lamport Distributed Snapshot Algorithm

This article explains the Chandy‑Lamport algorithm for capturing consistent global snapshots in distributed systems, describes its assumptions and message‑marker rules, walks through a detailed example with three processes and channels, and relates it to Apache Flink's asynchronous checkpoint mechanism.

政采云技术
政采云技术
政采云技术
Understanding the Chandy‑Lamport Distributed Snapshot Algorithm

Preface

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

Apache Flink is a distributed processing engine and framework for stateful computation on unbounded or bounded data streams. In a stateful distributed system, failures caused by hardware faults, network issues, or killed processes require failure recovery, which depends on recording the global state before the failure.

Recording a global state in a distributed system is difficult because there is no reliable global clock or shared memory, and communication delays are unpredictable.

One intuitive synchronous algorithm to record the global state is:

Pause external data input.

Wait for all nodes to finish processing pending data.

Record the state of each node, forming the global state.

Resume external input and let nodes continue computation.

This "Stop‑The‑World" approach severely impacts throughput, so Flink uses a different algorithm called Asynchronous Barrier Snapshot (ABS), a variant of the Chandy‑Lamport snapshot algorithm.

Chandy‑Lamport Algorithm

The Chandy‑Lamport algorithm, invented by K. Mani Chandy and Leslie Lamport, records a distributed system's global state.

Distributed System Model

We define a simplified model: a finite set of processing nodes (Processes) connected by directed communication channels (Channels). The system can be represented as a directed graph where nodes are processes (P1, P2, P3) and edges are channels (C12, C21, etc.).

The algorithm assumes:

Messages in a channel are FIFO ordered.

Messages may be delayed but will eventually be delivered.

Processes do not crash.

Algorithm Rules

A global snapshot consists of each process's local state and each channel's state. The algorithm introduces a special Marker message to trigger state recording.

Marker sender rules:

Record the process's local state.

Send a Marker on all outgoing channels.

Record the state of all incoming channels except the one on which the Marker was received.

Marker receiver rules:

First time receiving a Marker on a channel: Set that channel's state to empty. Continue as a sender (record local state, forward Marker on outgoing channels, start recording other incoming channels).

Subsequent Marker on the same channel: Stop recording messages from that channel. Set the channel's state to the sequence of messages recorded between the first and second Marker .

Algorithm Flow

The snapshot can be initiated by any process. The steps are:

Initiate snapshot: Process Pi records its local state and sends Marker on all outgoing channels, then records messages on its other incoming channels.

Propagate snapshot: When a process Pj receives a Marker for the first time on channel Cij, it records its local state, sets Cij state to empty, forwards Marker on its outgoing channels, and starts recording other incoming channels. If it receives a later Marker on a channel, it stops recording that channel and stores the recorded messages as the channel state.

Finish snapshot: Once every process has received a Marker on all incoming channels, the snapshot is complete. An external coordinator can then collect all local and channel states to form the global state.

Algorithm Example

Consider three processes P1, P2, P3 and four channels C12, C21, C23, C31. Initial process states are P1={a,b,c}, P2={d,e,f}, P3={g,h,i} and all channels are empty.

After some messages are exchanged (c from P1→P2, d from P2→P1, e from P2→P3, g from P3→P1), P1 initiates the snapshot. The example walks through each step, showing how processes record local states, set channel states to empty or to recorded message sequences, and forward Marker messages. The final recorded states are:

Process states: P1={a,b}, P2={c,f}, P3={i,e}

Channel states: C12={}, C21={d}, C23={}, C31={g,h}

Conclusion

The Chandy‑Lamport algorithm provides a simple and effective way to capture a consistent global snapshot without stopping the system, yielding a logically consistent state rather than a physically simultaneous one.

References

Distributed Snapshots: Determining Global States of Distributed Systems | the morning paper (https://blog.acolyer.org/2015/04/22/distributed-snapshots-determining-global-states-of-distributed-systems/)

Paper Reading: Distributed Snapshots: Determining Global States of Distributed Systems | Matt's Blog (https://matt33.com/2019/10/27/paper-chandy-lamport/)

Recruitment

The Zero technology team in Hangzhou, with over 300 engineers from Alibaba, Huawei, NetEase, Zhejiang University, etc., works on cloud‑native, blockchain, AI, low‑code platforms, middleware, big data, and more. We contribute to many open‑source projects such as Flutter, scikit‑learn, Apache Dubbo, RocketMQ, Pulsar, Dapr, DolphinScheduler, Seata, and are looking for passionate engineers. If interested, email [email protected].

Distributed SystemsApache FlinkFailure RecoveryGlobal StateChandy-Lamportsnapshot algorithm
政采云技术
Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.