Mastering ZooKeeper: Installation, Znode Model, Watchers, and Leader Election Explained
This comprehensive guide walks you through ZooKeeper's role in distributed systems, detailed installation steps, Znode data structures, watcher mechanisms, ZAB protocol operations, and the FastLeaderElection algorithm, providing practical commands and configuration examples for building reliable coordination services.
Background
As distributed applications scale, centralized monolithic architectures encounter single‑point failures, poor scalability, and complex maintenance. Moving to a distributed architecture solves these issues but introduces new challenges such as data consistency, fault tolerance, and coordinated task execution across multiple nodes.
Distributed Coordination Components
To address these challenges, open‑source coordination services like ZooKeeper, etcd, and Consul provide consistent state management and fault‑tolerant coordination. ZooKeeper, an Apache top‑level project derived from Google Chubby, is widely used in Hadoop, HBase, Kafka and many other systems.
ZooKeeper Overview
ZooKeeper offers a hierarchical namespace of Znode nodes that store data as key/value pairs. All operations are atomic and ordered, and the service guarantees that each Znode’s data size does not exceed 1 MiB. Znodes can be permanent or temporary, and may be sequentially numbered.
Installation & Basic Configuration
After installing JDK, download the stable release (e.g., apache-zookeeper-3.7.0-bin.tar.gz) and extract it into three directories to simulate a three‑node ensemble ( zk1, zk2, zk3). Create data and logs sub‑directories in each instance and add a myid file containing the node’s numeric ID (1, 2, 3).
/Users/newboy/ZooKeeper/zk1</code>
<code>/Users/newboy/ZooKeeper/zk2</code>
<code>/Users/newboy/ZooKeeper/zk3Configure each instance’s conf/zoo.cfg (example for zk1):
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/Users/newboy/ZooKeeper/zk1/data
dataLogDir=/Users/newboy/ZooKeeper/zk1/logs
clientPort=2181
server.1=127.0.0.1:8881:7771
server.2=127.0.0.1:8882:7772
server.3=127.0.0.1:8883:7773Copy the configuration to zk2 and zk3, adjusting dataDir, dataLogDir, and clientPort accordingly.
Start each server:
/Users/newboy/ZooKeeper/zk1/apache-zookeeper-3.7.0-bin/bin/zkServer.sh start
/Users/newboy/ZooKeeper/zk2/apache-zookeeper-3.7.0-bin/bin/zkServer.sh start
/Users/newboy/ZooKeeper/zk3/apache-zookeeper-3.7.0-bin/bin/zkServer.sh startData Model – Znode
Znodes form a tree‑like namespace similar to a Unix file system. The root path is /. Example commands create nodes /Dog, /Cat, and /Cat/TomCat and list their children:
create /Dog
create /Cat
create /Cat/TomCat
ls /
ls /CatKey properties:
Znode behaves like both a file (stores data, timestamps) and a directory (can have children).
All operations are atomic and sequentially consistent.
Maximum data size per Znode is 1 MiB.
Paths are absolute and immutable.
Two node types exist: PERSISTENT (lifetime independent of client session) and EPHEMERAL (deleted when the session ends). Both can be SEQUENTIAL , appending a monotonically increasing suffix.
Watcher Mechanism & Distributed Lock
Clients can register a Watcher on a Znode. When the watched Znode changes (e.g., deletion), the server notifies the client, enabling publish/subscribe patterns. A classic use‑case is a fair distributed lock built with an EPHEMERAL‑SEQUENTIAL node under a lock directory ( /Lock). Clients create sequential nodes, the smallest sequence acquires the lock, and others watch the next‑smaller node to be notified when it disappears.
ZAB Protocol – Write and Read Flow
ZooKeeper guarantees consistency via the ZooKeeper Atomic Broadcast (ZAB) protocol. Writes are processed by the Leader:
Client sends a write request to the Leader.
Leader proposes the request to all Followers and waits for a majority of ACKs.
Upon receiving a majority, Leader commits the change and replies to the client.
Read requests can be served by any server (Leader, Follower, or Observer) directly from its local memory, which explains ZooKeeper’s high read‑throughput.
Leader Election Process
ZooKeeper uses the FastLeaderElection algorithm (TCP‑based). Each server maintains a logicClock, myid, and the highest known zxid. Election steps:
All servers start in LOOKING state and vote for themselves.
Servers exchange votes; the vote with the highest logicClock, then highest zxid, then highest myid wins.
When a server receives a majority for a candidate, it adopts that role: the winner becomes LEADING, others become FOLLOWING. Observers may join without voting rights.
If the Leader crashes, remaining Followers re‑enter LOOKING and repeat the process.
Commit Guarantees
Only writes that have been committed by the Leader after receiving a majority of ACKs are persisted across leader changes. Uncommitted writes (not replicated to a majority) are lost if the Leader fails.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
