Big Data 25 min read

Mastering ZooKeeper: Core Concepts and Real-World Big Data Applications

This article introduces ZooKeeper’s fundamental architecture, explains its key concepts such as cluster roles, sessions, ZNodes, watches, and ACLs, and then details how it powers essential distributed coordination tasks—including configuration management, naming services, master election, and distributed locks—in large‑scale Hadoop and HBase ecosystems.

Efficient Ops

May 2, 2017

Mastering ZooKeeper: Core Concepts and Real-World Big Data Applications

Preface

ZooKeeper is an open‑source distributed coordination service created by Yahoo, an implementation of Google’s Chubby. Applications can use ZooKeeper for data publish/subscribe, load balancing, naming, coordination/notification, cluster management, master election, distributed locks and queues.

1. Introduction

2. Basic Concepts

This section introduces the core concepts of ZooKeeper that are referenced throughout the article.

2.1 Cluster Roles

ZooKeeper defines three roles: Leader Follower Observer

At any moment a ZooKeeper ensemble has a single Leader; the others are Followers or Observers.

All nodes share the same zoo.cfg file; only the myid file differs. The myid value must match the server.{id} entry in zoo.cfg.

Example zoo.cfg:

Running zookeeper-server status on a machine shows its role (Leader or Follower).

In the example, node‑20‑104 is Leader, node‑20‑103 is Follower.

ZooKeeper by default supports only Leader and Follower. To enable Observer mode, add peerType=observer to the configuration of any node that should be an observer and append :observer to the server line, e.g. server.1:localhost:2888:3888:observer The Leader provides both read and write services to clients; Followers and Observers provide only read services. Observers do not participate in leader election or the majority‑write quorum, allowing them to improve read throughput without affecting write performance.

2.2 Session

A session represents a client’s long‑lived TCP connection to ZooKeeper. The default client port is 2181. When a client connects, a session starts; heartbeats keep it alive, and the client can send requests and receive watch events.

The SessionTimeout defines how long a session may be inactive before ZooKeeper considers it expired. If the client reconnects to any server within this timeout, the session remains valid.

2.3 ZNode

In ZooKeeper, a ZNode is a data node in a hierarchical tree (paths separated by “/”, e.g., /hbase/master). Each ZNode stores data and metadata. ZNodes can be persistent or ephemeral.

Note: A ZNode is analogous to both a file and a directory in a Unix file system.

Persistent ZNodes remain until explicitly deleted. Ephemeral ZNodes are tied to the client session and disappear when the session ends.

ZooKeeper also supports the SEQUENTIAL flag, which appends an auto‑incremented integer to the node name at creation time.

2.4 Version

Each ZNode has a Stat structure that records three version numbers: version (data version), cversion (children version), and aversion (ACL version).

2.5 Stat Information

Using the get command returns both the data and the Stat of a ZNode. The version field is used for optimistic locking to ensure atomic updates.

2.6 Transaction Operations

Operations that modify ZooKeeper state (node create/delete, data update, session create/expire) are transactions. Each transaction receives a globally unique 64‑bit ZXID, which defines a total order of updates.

2.7 Watcher

Watchers allow clients to register interest in specific ZNode events. When the event occurs, ZooKeeper notifies the interested clients, enabling distributed coordination.

2.8 ACL

ZooKeeper uses Access Control Lists (ACLs) with five permissions: CREATE, READ, WRITE, DELETE, ADMIN. CREATE and DELETE apply only to child nodes.

Note: CREATE and DELETE are permissions on child nodes.

3. Typical ZooKeeper Use Cases

ZooKeeper’s high availability and strong consistency make it a powerful tool for solving distributed coordination problems.

3.1 Data Publish/Subscribe (Configuration Center)

Publishers write configuration data to ZooKeeper nodes; subscribers watch those nodes to receive updates. Configuration data is typically small, changes at runtime, and is shared across the cluster.

Two design patterns exist: push (server pushes updates) and pull (client polls). ZooKeeper combines both: clients register watches, and when data changes the server notifies the client, which then pulls the latest data.

3.2 Naming Service

Clients can resolve a logical name to a service address by reading a ZNode. Sequential nodes enable generation of globally unique identifiers, useful for RPC frameworks.

3.3 Distributed Coordination / Notification

Clients register watches on a ZNode; when the node changes, all interested clients receive notifications, allowing real‑time handling of data changes.

3.3.1 Heartbeat Detection

Processes create temporary child nodes under a designated ZNode. The existence of a child node indicates the process is alive; its disappearance signals failure, enabling decoupled heartbeat monitoring.

3.3.2 Progress Reporting

Tasks create temporary nodes to report progress. Other components can read these nodes to monitor execution status.

3.4 Master Election

ZooKeeper is widely used for master election in systems such as HDFS NameNode, YARN ResourceManager, and HBase HMaster. Clients attempt to create the same temporary node; only one succeeds and becomes the master. Others watch the node and retry when it disappears.

3.5 Distributed Locks

Locks are represented by ZNodes. To acquire an exclusive lock, a client creates an ephemeral node under a lock path; only one client can succeed. Others watch the lock node and retry when it is released. Shared locks allow multiple readers but block exclusive locks.

4. ZooKeeper in Large‑Scale Distributed Systems

4.1 ZooKeeper in Hadoop

ZooKeeper provides HA for HDFS NameNode and YARN ResourceManager, and stores YARN application state. Leader election uses a lock node such as /yarn‑leader‑election/appcluster‑yarn/ActiveBreadCrumb. Standby ResourceManagers register watchers on this node; when the active node fails, the lock node disappears and a new leader is elected.

ResourceManager state (RMStateStore) can be persisted in memory, filesystem, or ZooKeeper; ZooKeeper is recommended for its small state size.

4.2 ZooKeeper in HBase

HBase relies on ZooKeeper for HMaster election, system fault tolerance, RootRegion management, Region state management, and distributed SplitWAL task coordination. RegionServers create temporary nodes under /hbase/rs; HMaster watches these nodes to detect failures and trigger recovery.

RootRegion location is stored in /hbase/meta‑region‑server. Changes to RootRegion or Region assignments are propagated via ZooKeeper watches.

SplitWAL tasks are coordinated by a ZNode (e.g., /hbase/SplitWAL) that lists which RegionServer processes which WAL segment.

References

“From Paxos to ZooKeeper”.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

HBase Hadoop Distributed Coordination Master Election Distributed Locks

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.