Mastering ZooKeeper: Core Concepts and Real-World Big Data Applications
This article explains ZooKeeper’s architecture, key concepts such as roles, sessions, ZNodes, versioning, ACLs, and watchers, and demonstrates how it powers essential big‑data components like Hadoop’s ResourceManager and HBase’s master election, naming service, and distributed locking.
Overview
ZooKeeper is an open‑source distributed coordination service originally created by Yahoo as an implementation of Google’s Chubby. It provides strong consistency, a hierarchical in‑memory data model (ZNodes), and a lightweight watch mechanism that enables data publish/subscribe, naming, leader election, distributed locks, and other coordination patterns.
Basic Concepts
Cluster Roles and Configuration
Leader – the only server that processes write requests.
Follower – serves read requests and participates in leader election.
Observer – read‑only server that does not vote; enabled by adding peerType=observer to the server line in zoo.cfg (e.g., server.1:localhost:2888:3888:observer).
All servers share the same zoo.cfg file; the only per‑node difference is the myid file, which must contain the numeric identifier used in the server.{id}=... entry.
Use zookeeper-server status on a node to display its role (Leader, Follower, or Observer).
Session
A client establishes a long‑lived TCP connection (default port 2181). The session starts on connection and expires after sessionTimeout if the client cannot reconnect to any server in the ensemble.
ZNode
Data is stored in a tree of ZNodes identified by paths such as /hbase/master. ZNodes can be:
Persistent – remain until explicitly deleted.
Ephemeral (temporary) – automatically removed when the creating client’s session expires.
The SEQUENTIAL flag can be added to a create request; ZooKeeper appends a monotonically increasing integer to the node name.
Versioning
Each ZNode has a Stat structure with three version counters: version – data version (used for optimistic locking). cversion – children version. aversion – ACL version.
Transaction IDs (ZXID)
Every state‑changing operation receives a globally unique 64‑bit transaction ID (ZXID) that defines a total order of updates.
Watcher
Clients can register a watcher on a ZNode. ZooKeeper sends a one‑time notification when the node’s data or children change, enabling asynchronous coordination.
Access Control List (ACL)
ZooKeeper defines five permissions: CREATE, READ, WRITE, DELETE, and ADMIN. CREATE and DELETE apply only to child nodes.
Typical Use Cases
Configuration Center (Publish/Subscribe)
Small, frequently changing configuration data is stored in ZNodes. Clients register watchers on the configuration node; when the data changes, ZooKeeper pushes a notification and the client pulls the latest value.
Naming Service
Creating a sequential ZNode yields a globally unique name that can be used as a service identifier or RPC endpoint.
Distributed Coordination / Notification
Multiple processes register watchers on the same ZNode. Any change triggers a notification to all watchers, allowing real‑time coordination.
Master Election
Clients compete to create a designated temporary ZNode (e.g., /master_election). The client that succeeds becomes the master; others watch the node and re‑elect when it disappears.
Distributed Lock
Exclusive (Write) Lock Define a lock path, e.g., /exclusive_lock . Each client attempts to create an ephemeral child node /exclusive_lock/lock . ZooKeeper guarantees that only one client succeeds, thereby acquiring the lock. All other clients set a watcher on /exclusive_lock/lock to be notified when the lock is released (node deletion).
Shared (Read) Lock Clients create distinct ephemeral nodes under /shared_lock/ . The lock is considered held in shared mode as long as at least one such node exists. A client may acquire an exclusive lock only after all shared lock nodes have been removed.
ZooKeeper in Large‑Scale Systems
Hadoop
ZooKeeper provides high availability for HDFS NameNode and YARN ResourceManager. Both components use a lock node such as /yarn-leader-election/appcluster-yarn/ActiveBreadCrumb (ephemeral). The ResourceManager that successfully creates this node becomes Active ; the others remain Standby and register a watcher on the node to detect failover.
YARN RMStateStore can be persisted in ZooKeeper under /rmstore, with sub‑nodes like /rmstore/RMAppRoot and /rmstore/RMDTSecretManagerRoot.
HBase
Master Election & HA – identical to Hadoop’s leader election using a temporary node (e.g., /hbase/master).
RegionServer Fault Detection – each RegionServer creates an ephemeral node under /hbase/rs/[hostname]. HMaster watches this path; deletion of a child node indicates a RegionServer failure.
RootRegion Location – stored in /hbase/meta-region-server. Changes are detected via watchers to keep clients aware of the current RootRegion.
Region State Management – Region transitions (offline/online) are coordinated through ZNodes; the state is visible to the whole cluster.
Distributed SplitWAL – HMaster creates a persistent node /hbase/SplitWAL containing a list of WAL split tasks. RegionServers claim tasks by updating this node, enabling parallel log recovery.
Summary
ZooKeeper’s strong consistency, hierarchical ZNode model, and one‑time watch notifications make it a versatile backbone for coordination, configuration management, naming, leader election, and distributed locking in large‑scale data platforms such as Hadoop and HBase.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
