Fundamentals 8 min read

Understanding ZAB: The Zookeeper Atomic Broadcast Protocol

This article explains the ZAB protocol—Zookeeper's atomic broadcast and crash‑recovery mechanism—detailing its design, message‑broadcast process, leader election, transaction ordering with ZXID, and how it ensures data consistency and availability in distributed systems.

Architect's Tech Stack
Architect's Tech Stack
Architect's Tech Stack
Understanding ZAB: The Zookeeper Atomic Broadcast Protocol

Many consistency protocols exist, such as Paxos, Raft, 2PC, and 3PC; this article introduces ZAB (Zookeeper Atomic Broadcast), which is the most widely used in production because it was specifically designed for Zookeeper.

ZAB, short for Zookeeper Atomic Broadcast, replaces Paxos in Zookeeper and provides two key capabilities: 崩溃恢复 (crash recovery) and 原子广播 (atomic broadcast). Using a primary‑backup architecture, Zookeeper maintains data consistency across replicas.

The protocol works by having the leader receive all client write requests, package each request into a transaction proposal, and broadcast it to all followers. If a majority of followers acknowledge the proposal, the leader commits the transaction locally and then sends a commit to the followers, completing the broadcast.

The broadcast process consists of three steps: (1) replicate data to followers, (2) wait for acknowledgments from a majority of followers, and (3) commit the transaction once the majority is reached. A message queue between leader and followers decouples them, avoiding synchronous blocking.

Each transaction is assigned a globally unique, monotonically increasing identifier called ZXID. The low 32 bits act as a simple counter, while the high 32 bits encode the epoch of the current leader, ensuring both leader uniqueness and transaction ordering.

When a leader crashes, ZAB switches to 崩溃恢复 mode. The protocol guarantees that any transaction already committed by the leader will eventually be committed on all servers, while uncommitted transactions are discarded. The election algorithm selects the server with the highest ZXID as the new leader, eliminating the need for additional commit‑or‑discard checks.

After recovery, the new leader verifies that all transactions have been synchronized with a majority of followers before accepting new client requests. Synchronization relies on the ZXID scheme: the high 32 bits identify the leader epoch, and the low 32 bits order transactions within that epoch.

In summary, ZAB shares similarities with Raft—both use a leader, majority acknowledgment, and a primary‑backup model—but it simplifies the two‑phase commit process, resolves its single‑point‑of‑failure issue, and employs a queue to achieve asynchronous decoupling, thereby ensuring reliable data consistency in Zookeeper clusters.

distributed systemsZookeeperConsensusZABcrash recoveryAtomic BroadcastZXID
Architect's Tech Stack
Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.