Mastering ZooKeeper: Core Concepts, Architecture, and Installation Guide
This article introduces ZooKeeper’s fundamental concepts, design goals, architecture, key features, common use cases such as service registry and configuration management, and provides step‑by‑step instructions for installing and configuring ZooKeeper in standalone mode on CentOS 7.
ZooKeeper Basic Concepts
ZooKeeper is an open‑source coordination service for large distributed systems, originally a sub‑project of Hadoop. It provides reliable services such as configuration maintenance, naming, distributed synchronization, and group management, exposing simple APIs while ensuring high performance and stability.
Typical use case: a service registry where producers register themselves in ZooKeeper and consumers discover services by querying the registry.
Design Goals
ZooKeeper offers a hierarchical namespace similar to a file system, where each node (znode) can store data. Unlike traditional file systems, data is kept in memory to achieve high throughput and low latency.
Main Features
1) Strong consistency – all clients see the same view. 2) Reliability – once a server accepts a message, all servers accept it. 3) Real‑time – latest data may require a sync() call before reading. 4) Wait‑free – slow or failed clients do not block fast ones. 5) Atomicity – updates either succeed completely or fail. 6) Order – messages are delivered in the same order on all servers.
ZooKeeper Architecture
ZooKeeper consists of a set of servers (a quorum) and clients. Clients maintain a TCP connection to any server, send requests, receive responses, and watch events. If a connection fails, the client reconnects to another server.
Each server stores an in‑memory data tree and a persistent transaction log and snapshot on disk. A leader is elected to handle write operations; writes are considered successful when a majority of servers acknowledge them.
Roles
Servers can be leaders, followers, or observers. The leader processes all write requests and broadcasts updates to followers. Observers receive client connections and forward writes to the leader but do not participate in voting, improving scalability without affecting throughput.
Write Data Flow
The write process involves the client sending a request to a server, the server forwarding it to the leader if it is not the leader, the leader broadcasting the update to a majority of servers, and finally acknowledging success back to the client.
Core Components
Key components include the replicated in‑memory database (the data tree), request processors, and the Zab (ZooKeeper Atomic Broadcast) protocol that ensures consistency across the cluster.
Typical Application Scenarios
Unified naming service for distributed applications.
Configuration management with automatic propagation of changes.
Cluster management and real‑time node status monitoring.
Distributed notification and coordination, acting as a publish/subscribe system.
Distributed locks ensuring exclusive access to shared resources.
Distributed queues supporting both synchronous and FIFO processing models.
Installation and Deployment
ZooKeeper can be deployed in three modes: standalone, cluster, and pseudo‑cluster. The following steps illustrate a standalone installation on CentOS 7.
Create a directory, e.g., /home/xuliugen/Desktop/zookeeper-install.
Download the tarball, e.g.,
https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz, and extract it.
Copy the sample configuration file: cp zoo_sample.cfg zoo.cfg.
Optionally adjust zoo.cfg; for better performance, set a separate dataLogDir for transaction logs.
Start the server with ./zkServer.sh start and connect using ./zkCli.sh.
Cluster and pseudo‑cluster configurations are beyond the scope of this guide.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
