Understanding ZooKeeper: Architecture, Data Model, Sessions, and Leader Election

ZooKeeper is an open‑source distributed coordination service that provides primitives for synchronization, configuration management, and naming, featuring a hierarchical data model of znodes, session handling, one‑time watches, strong consistency guarantees, leader election via Zab protocol, and detailed roles of leader, follower, and observer.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Understanding ZooKeeper: Architecture, Data Model, Sessions, and Leader Election

ZooKeeper Overview

ZooKeeper is an open‑source distributed coordination service that offers a simple set of primitives enabling distributed applications to implement synchronization, configuration maintenance, and naming services.

Design Goals

Linearizability: All clients see a consistent view regardless of the server they connect to.

Reliability: Once a message is accepted by one server, it is eventually accepted by all servers.

Timeliness: Clients receive updates or failure notifications within a bounded time interval.

Wait‑free: Slow or failed clients do not impede fast clients.

Atomicity: Updates either succeed completely or fail without intermediate states.

Ordering: Global and partial ordering of operations are guaranteed.

Data Model

ZooKeeper maintains a hierarchical namespace similar to a traditional file system, where each node is called a znode and is uniquely identified by its path (e.g., /NameService/Server1).

Each znode can store data and may have child znodes.

Ephemeral znodes cannot have children and are removed when the session that created them ends.

znodes are versioned; each update increments a version number.

Node types include:

Persistent: Remains after server restarts.

Ephemeral: Deleted automatically when the creating session ends.

Non‑sequential: Created with the exact name supplied.

Sequential: Server appends a monotonically increasing 10‑digit decimal suffix.

znodes can be watched for data changes or child‑list modifications.

Each state change generates a globally ordered zxid (ZooKeeper Transaction ID), a 64‑bit number composed of a high‑order epoch and a low‑order counter.

Session Management

Clients establish a session with a ZooKeeper ensemble. The session state transitions (CONNECTING, CONNECTED, CLOSED, etc.) are illustrated in the accompanying diagram. If a client loses connection due to timeout, it remains in CONNECTING and attempts reconnection; the server decides when a session expires.

Watch Mechanism

Watches are one‑time triggers set on read operations ( getData(), getChildren(), exists()). When the watched data changes, the server sends a single notification to the client. Subsequent changes require the client to set a new watch.

One‑time trigger: Fires only on the first change after being set.

Sent to client: Delivered asynchronously over the client‑server socket.

Watched data: Can be a data watch or a child‑list watch.

If a client disconnects before receiving a watch event, the event may be lost; reconnection does not automatically restore the watch.

Consistency Guarantees

ZooKeeper provides several guarantees:

Sequential Consistency: Updates from a single client are applied in order.

Atomicity: Updates are all‑or‑nothing.

Single System Image: All clients see the same system state.

Reliability: Once committed, updates persist until overwritten.

Timeliness: Clients observe a consistent view within a bounded time.

Server Roles and States

Each server in the ensemble assumes one of three roles: leader , follower , or observer . Their possible states are leading , following , observing , and looking (searching for a leader).

Leader Election

When the current leader fails, the ensemble enters recovery mode and elects a new leader using the Zab protocol, which can operate in a basic Paxos or fast Paxos mode (fast Paxos is the default). The election proceeds as follows:

The election thread on each server initiates a vote.

It queries all servers (including itself) for their IDs and last zxid.

After collecting responses, it selects the server with the highest zxid as the candidate.

If the candidate obtains a majority (⌊n/2⌋+1) of votes, it becomes the leader; otherwise the process repeats.

The total number of servers must be odd (2n+1) and a majority of at least n+1 must be alive for a successful election.

Leader Workflow

Recover data after a restart.

Maintain heartbeats with followers, receive and classify follower requests.

Process follower messages: PING (heartbeat), REQUEST (write or sync), ACK (vote acknowledgment), REVALIDATE (session renewal).

Follower Workflow

Send requests to the leader (PING, REQUEST, ACK, REVALIDATE).

Receive and handle leader messages.

Forward client write requests to the leader for voting.

Return results to the client.

Follower message types include PING, PROPOSAL, COMMIT, UPTODATE, REVALIDATE, and SYNC.

Zab Protocol (Two‑Phase Commit)

When a server receives a client request, the leader broadcasts a PROPOSAL to all followers. Each follower writes the proposal to disk and replies with an ACK. Once the leader collects a quorum of ACKs, it sends a COMMIT to finalize the transaction.

Ordering guarantee: All servers execute transactions in the same order.

If the leader crashes after sending a proposal but before committing, the new leader must ensure any committed transaction is also committed.

Uncommitted proposals that were not seen by any follower are discarded.

Summary

The article provides a concise yet comprehensive overview of ZooKeeper’s core concepts, including its hierarchical data model, session and watch mechanisms, consistency guarantees, server roles, leader election process, leader/follower responsibilities, and the Zab protocol that underpins reliable state replication.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ZooKeeperConsistencydata-modelDistributed Coordinationleader electionZAB ProtocolWatches
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.