Comprehensive Overview of Zookeeper: Core Features, Architecture, Protocols, and Use Cases
This article provides a detailed introduction to Zookeeper, covering its role as a distributed coordination service, core functionalities such as a file‑system‑like data model, notification and cluster management, the ZAB consensus protocol, node types, leader election, distributed lock implementation, and typical application scenarios.
Zookeeper is an open‑source distributed coordination service that serves as a core component for building reliable distributed systems.
The service offers three main capabilities: a file‑system‑like hierarchical data store (znodes), a watch‑based notification mechanism that alerts clients of node changes, and a cluster management model with a leader‑follower architecture that automatically elects a new leader on failure.
Typical application scenarios include name service (generating globally unique IDs), distributed coordination via watches, cluster state management, and implementing distributed locks using temporary sequential znodes.
Zookeeper relies on the ZAB (Zookeeper Atomic Broadcast) protocol, an atomic broadcast algorithm inspired by Paxos and similar to Raft, to achieve strong consistency across the ensemble.
The ZAB protocol proceeds through four phases: Leader Election, Discovery, Synchronization, and Broadcast, ensuring that a majority of servers agree on the order of updates before they are applied.
In operation, Zookeeper follows the observer pattern: clients register watches on znodes, and any change (data update, node creation/deletion, child addition/removal) triggers a notification to the interested clients.
The ensemble consists of 2N+1 servers (e.g., three servers for N=1) to tolerate failures while maintaining a majority.
Roles within the ensemble are Leader (handles all write requests and can serve reads), Follower (serves reads and participates in leader election), and Observer (serves reads only and does not vote).
Four znode types are defined: Persistent, Persistent Sequential, Ephemeral, and Ephemeral Sequential, each with specific lifecycle semantics.
The data model mirrors a Unix‑like file system, with a hierarchical tree of znodes stored as key/value pairs, rooted at "/".
Leader election follows a multi‑step process: each server votes, votes are collected, the candidate with a majority becomes leader, and all servers update their state accordingly.
Distributed locks are implemented by creating an Ephemeral Sequential znode under a lock path; the client that holds the smallest sequence number acquires the lock, while others watch the predecessor node.
The watch mechanism works by clients registering interest in a znode, storing watch metadata locally, and receiving callbacks from the server when the watched node changes.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.