Deep Dive into ZooKeeper 3.5.5: Server Startup, Leader Election, and Request Processing
This article provides a comprehensive analysis of ZooKeeper 3.5.5 source code, covering the distributed‑mode server startup sequence, leader election algorithm, cluster data synchronization, server role responsibilities, and the request‑processor pipeline, offering practical insights for developers and operators.
ZooKeeper Overview
ZooKeeper is an open‑source coordination service that offers high availability, strong consistency, and high performance for large‑scale distributed systems. It simplifies the development of distributed applications by providing simple APIs for tasks such as leader election, group membership, and metadata management.
Server Startup (Distributed Mode)
The distributed‑mode startup consists of several stages:
Parse the configuration file.
Recover data from snapshots and transaction logs.
Listen for client connections (without processing requests yet).
Bind the election port and listen for server‑to‑server connections.
Perform leader election.
Initialize ZooKeeperServer.
Synchronize data among servers.
After synchronization, enable client request handling.
The entry point is QuorumPeerMain, which passes the configuration file as arguments. A DatadirCleanupManager thread periodically removes old snapshots and transaction logs to prevent storage bloat.
Leader Election Mechanism
Leader election is driven by QuorumPeer.startLeaderElection() and involves two core components: QuorumCnxManager (network I/O management) and FastLeaderElection (the election algorithm).
During election, each server creates a QuorumCnxManager.Listener thread to accept connections from peers. FastLeaderElection broadcasts Notification messages containing the candidate’s sid, zxid, and epoch. The algorithm compares votes using the following rules:
Higher epoch wins (newer leader state).
If epochs are equal, higher zxid (which includes a counter) wins.
If both are equal, the larger sid wins.
The election proceeds in three cases based on the received electionEpoch:
If equal to the local epoch, the vote is valid and may replace the local vote if the remote vote is superior.
If greater, the local epoch is reset, old votes are cleared, and the superior vote is adopted.
If smaller, the remote vote is ignored.
When a server becomes leader, it creates a Leader instance and starts the LearnerCnxAcceptor thread to accept connections from followers and observers.
Cluster Data Synchronization
After a leader is elected, the cluster synchronizes state through four possible actions:
DIFF : Learner lacks some transactions.
TRUNC : Learner has extra transactions.
DIFF+TRUNC : Learner both lacks and has extra data.
SNAP : Learner is far behind and receives a full snapshot.
The leader computes a new epoch, broadcasts it to a majority of learners, and then proceeds with data sync. Only after more than half of the learners acknowledge the new epoch does the leader move to the next phase.
Server Roles and Request Processors
ZooKeeper defines three server roles:
Leader : Handles client writes, creates proposals, assigns zxid, and broadcasts them.
Follower : Receives proposals from the leader, acknowledges them, and applies them.
Observer : Receives updates but does not participate in quorum.
Each server runs a pipeline of RequestProcessor implementations: PrepRequestProcessor: Consumes client requests, creates transactions for write operations, and enqueues them. SyncRequestProcessor: Persists transactions to disk and creates snapshots. FinalRequestProcessor: Applies transactions to the in‑memory data tree or reads data for read‑only requests.
The leader’s pipeline also includes a Leader processor that coordinates proposal broadcasting, while followers and observers run Follower or Observer processors respectively.
Key Code Snippet
setCurrentVote(makeLEStrategy().lookForLeader());References
ZooKeeper election analysis: https://juejin.im/post/5cc2af405188252da4250047
Apache ZooKeeper official site: https://zookeeper.apache.org/
ZooKeeper GitHub repository: https://github.com/apache/zookeeper
"ZooKeeper: Distributed Process Coordination" by Flavio Junqueira and Mahadev Konar
Additional blog posts: https://blog.reactor.top/tags/Zookeeper/ and https://www.cnblogs.com/sunshine-2015/tag/zookeeper/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Middleware
Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
