Zookeeper Introduction: Architecture, Installation, Features, and Core Concepts
This article provides a comprehensive overview of Zookeeper, covering its role in high‑concurrency distributed systems, installation steps on Linux, core features such as ordered updates, replication, fast coordination, as well as detailed explanations of sessions, znodes, node types, and watch mechanisms.
Zookeeper Introduction
The author, a learner rather than a preacher, continues a series on high‑concurrency distributed development by focusing on Zookeeper, a fundamental coordination service used in many large‑scale internet companies alongside RPC frameworks like Dubbo and Zookeeper itself.
1. Challenges in a Concurrent Environment
When multiple processes run across several servers to handle high traffic, several problems arise: keeping configuration consistent across machines, detecting failed nodes and taking over tasks, adding new machines without restarting the cluster, and coordinating writes to shared network files.
2. Introduction to Zookeeper
2.1 Name Origin
Many Apache projects use animal icons (e.g., Tomcat, Hive). Zookeeper’s role is to coordinate the actions of these “animals”.
2.2 What Is Zookeeper?
Zookeeper is a high‑performance coordination service for distributed applications. Data is kept in memory and persisted to a log. Its tree‑like structure enables a unified configuration center, service registration, distributed locks, etc. A Zookeeper ensemble must be aware of each other, maintaining an in‑memory state graph and a transaction log with snapshots. The service is available as long as a majority of servers are up, and clients connect to a single server, maintaining a TCP connection for requests, responses, watch events, and heartbeats.
2.3 Installation (Linux)
1. JDK version must be 1.6 or higher
2. Download: https://archive.apache.org/dist/zookeeper/zookeeper-3.5.2/zookeeper-3.5.2.tar.gz
3. In the extracted
conf
directory, add
zoo.cfg
4. Start server: bin/zkServer.sh start
5. Test client connection: bin/zkCli.sh -server 127.0.0.1:2181
Key
zoo.cfg
settings:
- tickTime=2000 // basic heartbeat interval
- dataDir // directory for data and logs
- clientPort // port for client connections2.4 Key Characteristics
2.4.1 Simple Data Structure
Similar to a Unix file‑system tree, each node (znode) can store data like a file or act as a directory. Node names must be unique among siblings, follow naming rules, and use absolute paths starting with “/”. Data size per node is limited.
2.4.2 Data Model
Zookeeper uses a hierarchical namespace where each znode has an absolute path. Node types include persistent, sequential, ephemeral, and ephemeral‑sequential.
2.4.3 Naming Rules
1. Null character (\u0000) is prohibited.
2. Control characters \u0001‑\u0019 and \u007F‑\u009F are disallowed.
3. Unicode ranges \ud800‑\uf8ff and \uFFF0‑\uFFFF are not allowed.
4. “.” can appear inside a name, but “.” and “..” cannot be used alone.
5. "zookeeper" is a reserved node name.2.4.4 Common Commands
Typical commands include ls / (list root), create /zk 123 (create node), with constraints such as a parent node must exist for create , and a node must have no children before delete .
2.4.5 Ordered Updates
Each write receives a globally ordered transaction ID (zxid). Versions (dataVersion, cversion, aclVersion) track changes to data, children, and ACLs. The tickTime defines heartbeat intervals and session timeouts.
2.4.6 Replication
Data is replicated across the ensemble, providing fault tolerance and eliminating single points of failure.
2.4.7 Speed
Zookeeper’s design enables low latency and high throughput, suitable for large distributed systems.
3. Zookeeper Theory
3.1 Session Mechanism
1. Each client gets a unique session ID.
2. Clients send periodic heartbeats to keep the session alive.
3. If no heartbeat is received within 2×tickTime, the session expires.
4. Requests within a session are processed FIFO.3.2 Znode Data Composition
Node data: actual stored information (state, config, location, etc.)
Node metadata: information returned by the stat command
Data size limit: 1 MB per node3.3 Znode Types
1. Persistent node: created with
create path value
2. Ephemeral node: created with
create -e path value
(deleted when session ends)
3. Sequential node: created with
create -s path value
(appends a 10‑digit sequence number)
Notes:
- Ephemeral nodes disappear when the session ends.
- Sequential nodes have a counter per parent and overflow at 2,147,483,647.
- Sequential nodes persist after the session ends.3.4 Watch Mechanism
Clients can set watches on znodes to be notified of create, delete, change, or child events. Watches are one‑time triggers; after firing they are removed, so continuous monitoring requires re‑setting the watch.
1. One‑time: watch is removed after it fires.
2. Ordered: client receives the watch notification before reading the change.
Watch caveats:
- Network latency may cause missed or delayed notifications.
- A single watch object is notified only once even if multiple operations trigger it.Conclusion
Through the above discussion, readers should now have a basic understanding of Zookeeper’s architecture, installation, core features, and mechanisms such as sessions, znodes, and watches, laying the groundwork for future topics like distributed locks and cluster scenarios.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.