Operations 13 min read

How Zookeeper Prevents Split‑Brain: Inside Quorum‑Based Leader Election

This article explains the split‑brain phenomenon in distributed clusters, uses Zookeeper as a case study to illustrate how network partitions can create multiple leaders, and details Zookeeper's majority‑quorum mechanism, node count considerations, and common strategies for avoiding split‑brain scenarios.

Senior Brother's Insights

Jul 28, 2021

How Zookeeper Prevents Split‑Brain: Inside Quorum‑Based Leader Election

What is split‑brain?

In a distributed system a single node (Master, Leader, etc.) is elected to coordinate the cluster. If a network partition isolates parts of the cluster, each partition may elect its own Leader. The result is two independent Leaders serving the same service, which can cause data inconsistency. This situation is called a split‑brain.

Split‑brain scenario in a ZooKeeper ensemble

Consider a ZooKeeper ensemble of six servers deployed in two data‑centers (three servers per site). Under normal operation a single Leader handles client requests; if the Leader crashes the remaining servers hold an election to choose a new Leader.

If the network link between the two data‑centers fails, each side perceives the Leader as missing and starts a new election. Because each side only sees three servers, neither side can obtain a majority of votes (see the quorum rule below), so the ensemble ends up without a Leader. When the network is restored the two partitions must agree on a single Leader and reconcile any divergent state.

ZooKeeper’s majority (quorum) rule

ZooKeeper prevents split‑brain by requiring a strict majority of voting servers before a node can become Leader. The rule is implemented in the QuorumMaj class:

public class QuorumMaj implements QuorumVerifier {
    int half;
    // n is the number of voting servers (observers are excluded)
    public QuorumMaj(int n) {
        this.half = n / 2;
    }
    // Returns true if the given set of votes exceeds the half threshold
    public boolean containsQuorum(Set<Long> set) {
        return (set.size() > half);
    }
}

For a six‑node ensemble half = 6 / 2 = 3, therefore at least four votes are required to elect a Leader. In a 3‑vs‑3 partition no side can reach four votes, so the ensemble stays leader‑less.

To guarantee that a Leader can be elected after a partition, the total number of voting nodes should be odd and the distribution should give one side a strict majority. For example, a 5‑node ensemble split 3:2 yields half = 5 / 2 = 2; the side with three servers can obtain four votes (including its own vote), while the side with two servers cannot. Consequently only one Leader is elected.

Ensuring an odd total number of voting nodes guarantees that at most one partition can hold a majority, thereby preventing split‑brain.

Why ZooKeeper ensembles use an odd number of voting nodes

2 nodes – losing 1 node makes the ensemble unavailable (tolerance 0).

3 nodes – losing 1 node still leaves a majority (tolerance 1).

4 nodes – losing 1 node leaves a majority, but losing a second node breaks the quorum (tolerance 1).

5 nodes – tolerance 2; 6 nodes have the same tolerance 2.

Thus an odd count (3, 5, 7 …) provides the same fault‑tolerance as the next even number while using fewer resources.

Epoch handling and leader hand‑over

ZooKeeper maintains an epoch number that increments each time a new Leader is elected. Followers accept requests only from a Leader whose epoch is greater than any previously seen epoch. If a previously elected Leader “appears dead” and a new Leader is chosen, the old Leader’s requests are rejected because its epoch is lower. This mechanism ensures that even if the old Leader recovers, it cannot cause split‑brain.

Common techniques to avoid split‑brain

Quorum (majority) voting : Define a quorum size (e.g., 2 for a 3‑node cluster, 3 for a 4‑node cluster) so that a Leader can be elected only when a majority of nodes are reachable.

Multiple heartbeat channels : Deploy redundant network links or heartbeat paths; loss of a single channel does not isolate a node.

Disk‑based locking : Use a shared disk lock that only one server can acquire. Advanced designs release the lock only after all heartbeat channels are lost, preventing dead‑lock when the Leader crashes.

External arbitration service : Provide a well‑known arbitrator (e.g., a static IP). When a node loses contact with the Leader it pings the arbitrator; nodes that cannot reach the arbitrator voluntarily step down.

These methods can be combined for stronger protection, but extreme failures (e.g., loss of all arbitrators) still require manual intervention.

Conclusion

Understanding split‑brain and ZooKeeper’s quorum‑based election algorithm helps engineers design resilient distributed systems. By configuring an odd number of voting nodes, relying on majority voting, and optionally adding redundant heartbeats, disk locks, or external arbitration, a cluster can avoid the pitfalls of multiple concurrent Leaders.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Zookeeper cluster management Leader Election quorum Split-Brain

Written by

Senior Brother's Insights

A public account focused on workplace, career growth, team management, and self-improvement. The author is the writer of books including 'SpringBoot Technology Insider' and 'Drools 8 Rule Engine: Core Technology and Practice'.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.