Mastering ZooKeeper: Core Concepts, Cluster Roles, and Practical Commands
This comprehensive guide explains ZooKeeper's purpose as a distributed coordination service, its design goals, cluster architecture, ZAB protocol, deployment steps, configuration details, data model, and hands‑on command‑line examples for managing nodes, watches, and four‑letter monitoring commands.
What Is ZooKeeper?
ZooKeeper is an open‑source distributed application coordination system (often abbreviated ZK) that provides a typical solution for data consistency across distributed applications. It enables features such as publish/subscribe, load balancing, naming service, distributed coordination/notification, cluster management, leader election, distributed locks, and queues. ZooKeeper runs on the Java platform and also offers C bindings.
Design Goals
ZooKeeper allows distributed processes to coordinate via a shared hierarchical namespace, organized similarly to a standard file system. Data is stored in memory, which yields high throughput and low latency. The system emphasizes high performance, high availability, and strict ordered access, making it suitable for large‑scale distributed systems without a single point of failure.
ZooKeeper Cluster Concepts
Cluster Roles
Leader : The elected primary node that handles all write operations and coordinates reads.
Follower : Nodes that can participate in elections; they serve read requests and follow the leader.
Observer : Nodes that do not vote in elections; they only receive read updates from the leader.
When a leader fails, followers elect a new leader; the old leader, once recovered, steps down to a follower role. In most deployments, observers are optional.
Clients maintain persistent TCP connections to ZooKeeper followers. Read requests are forwarded to a follower (or the local node), while write requests go to the leader. Changes are propagated to all servers in the ensemble.
Data Model
ZooKeeper stores data in a tree‑like structure of ZNodes . Each ZNode is identified by a path such as /app1 and can have child nodes like /app1/p_1-3. ZNodes maintain a stat structure that records version numbers for data, ACLs, and child nodes, as well as timestamps. The system supports two node types:
Persistent nodes (remain until explicitly deleted).
Ephemeral nodes (automatically removed when the client session ends).
Access control is enforced via ACLs with permissions CREATE, READ, WRITE, DELETE, and ADMIN.
ZAB Protocol
The ZooKeeper Atomic Broadcast (ZAB) protocol ensures crash‑fault‑tolerant leader election and data consistency. It operates in three states:
Looking – cluster startup or leader election.
Following – normal operation under an elected leader.
Leading – the node acting as leader.
The protocol proceeds through four phases: election, discovery, sync, and broadcast.
Deploying ZooKeeper
Download
wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gzExtract and Install
tar xf zookeeper-3.4.14.tar.gz -C /application
cp /application/zookeeper-3.4.14/conf/zoo_sample.cfg /application/zookeeper-3.4.14/conf/zoo.cfgSet Environment Variables
export ZOOKEEPER_HOME=/application/zookeeper-3.4.14
export PATH=$PATH:$ZOOKEEPER_HOME/binConfigure zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/application/zookeeper-3.4.14/data
dataLogDir=/application/zookeeper-3.4.14/logs
clientPort=2181
maxClientCnxns=60
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=localhost:2888:3888Create Directories
mkdir /application/zookeeper-3.4.14/{data,logs}ZooKeeper Server Scripts
The zkServer.sh script manages the server lifecycle:
/application/zookeeper-3.4.14/bin/zkServer.sh start
/application/zookeeper-3.4.14/bin/zkServer.sh stop
/application/zookeeper-3.4.14/bin/zkServer.sh restart
/application/zookeeper-3.4.14/bin/zkServer.sh statusZooKeeper Client (zkCli.sh)
The zkCli.sh script connects to a ZooKeeper server and provides an interactive shell. Common commands include: ls /path – list child nodes. ls2 /path – list child nodes with metadata. get /path – retrieve node data and metadata. stat /path – similar to get but without data. create [-s] [-e] /path data [acl] – create persistent, sequential ( -s), or ephemeral ( -e) nodes. set /path newData [version] – update node data. delete /path [version] – delete a node (must be empty). rmr /path – recursively delete a node and all its children.
Examples:
# Create a persistent node
create /permanent "permanent"
# Create child nodes
create /permanent/zk_node1 "zk_node1"
create /permanent/zk_node2 "zk_node2"
# List nodes
ls /
ls /permanent
# Create a sequential node
create -s /order "order"
# Create an ephemeral node (cannot have children)
create -e /temp "temp"
# Update data
set /permanent "permanent_set"
# Delete nodes
delete /permanent/zk_node1
rmr /order0000000004Four‑Letter Commands
ZooKeeper supports several four‑letter commands for monitoring and diagnostics. They can be sent via telnet or nc to the client port (default 2181): ruok – health check (responds with imok). stat – detailed server status including client connections. srvr – concise server information. conf – displays the server configuration. cons – lists all client connections and session details. wchs – shows watch counts. envi – prints the Java environment variables. dump – lists unprocessed sessions and ephemeral nodes. reqs – shows pending requests. mntr – provides key monitoring metrics (latency, packet counts, node count, etc.).
Example of a health check:
telnet localhost 2181
ruok
# Response
imokExample of retrieving metrics with mntr:
nc localhost 2181
mntr
zk_version\t3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f
zk_avg_latency\t0
zk_max_latency\t10
zk_packets_received\t1968459
zk_packets_sent\t1968460
zk_num_alive_connections\t5
zk_outstanding_requests\t0
zk_server_state\tfollower
zk_znode_count\t365
zk_watch_count\t3
zk_ephemerals_count\t4
zk_approximate_data_size\t23709
zk_open_file_descriptor_count\t37
zk_max_file_descriptor_count\t65536Conclusion
ZooKeeper provides a reliable, high‑performance foundation for distributed coordination, offering a simple hierarchical namespace, robust leader election via the ZAB protocol, and a rich set of command‑line tools for administration and monitoring.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
