Big Data 8 min read

Understanding the Elasticsearch Master Election Process

This article explains when Elasticsearch triggers a master election, describes each election stage—including active master and candidate selection, Bully algorithm comparison, and master node responsibilities—while providing code excerpts that illustrate the underlying implementation details.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding the Elasticsearch Master Election Process

Elasticsearch initiates a master election when the cluster starts, when the current master crashes, or when a node detects that the master no longer has a majority (n/2 + 1) acknowledgment.

The election workflow is illustrated with a diagram and consists of several key phases.

1. Filtering the activeMasters list

Nodes ping all members, wait for discovery.zen.ping_timeout, and collect responses. Active masters are those reported by other nodes as the current master, excluding the local node to avoid split‑brain scenarios.

List<DiscoveryNode> activeMasters = new ArrayList<>();<br/>for (ZenPing.PingResponse pingResponse : pingResponses) {<br/>    //不允许将自己放在activeMasters列表中<br/>    if (pingResponse.master() != null && !localNode.equals(pingResponse.master())) {<br/>        activeMasters.add(pingResponse.master());<br/>    }<br/>}

2. Filtering the masterCandidates list

Only nodes with node.master:true are eligible. Nodes with node.master:false are excluded.

# 配置某个节点没有成为master资格<br/>node.master:false
List<ElectMasterService.MasterCandidate> masterCandidates = new ArrayList<>();<br/>for (ZenPing.PingResponse pingResponse : pingResponses) {<br/>    if (pingResponse.node().isMasterNode()) {<br/>        masterCandidates.add(new ElectMasterService.MasterCandidate(pingResponse.node(), pingResponse.getClusterStateVersion()));<br/>    }<br/>}

3. Selecting a master from activeMasters

If the activeMasters list is not empty, Elasticsearch applies the Bully algorithm, preferring nodes with master eligibility and then the smallest node ID.

private static int compareNodes(DiscoveryNode o1, DiscoveryNode o2) {<br/>    if (o1.isMasterNode() && !o2.isMasterNode()) {<br/>        return -1;<br/>    }<br/>    if (!o1.isMasterNode() && o2.isMasterNode()) {<br/>        return 1;<br/>    }<br/>    return o1.getId().compareTo(o2.getId());<br/>}<br/><br/>public DiscoveryNode tieBreakActiveMasters(Collection<DiscoveryNode> activeMasters) {<br/>    return activeMasters.stream().min(ElectMasterService::compareNodes).get();<br/>}

4. Selecting a master from masterCandidates

If activeMasters is empty, candidates are considered. The list must meet discovery.zen.minimum_master_nodes. Candidates are compared first by their cluster‑state version (newer wins) and then by node ID.

public static int compare(MasterCandidate c1, MasterCandidate c2) {<br/>    int ret = Long.compare(c2.clusterStateVersion, c1.clusterStateVersion);<br/>    if (ret == 0) {<br/>        ret = compareNodes(c1.getNode(), c2.getNode());<br/>    }<br/>    return ret;<br/>}

5. Local node becomes master

The elected node waits for votes from at least discovery.zen.minimum_master_nodes-1 other nodes. If the required joins are received before discovery.zen.master_election.wait_for_joins_timeout, the node assumes the master role and starts node‑fault detection.

if (clusterService.localNode().equals(masterNode)) {<br/>    final int requiredJoins = Math.max(0, electMaster.minimumMasterNodes() - 1);<br/>    nodeJoinController.waitToBeElectedAsMaster(requiredJoins, masterElectionWaitForJoinsTimeout,<br/>            new NodeJoinController.ElectionCallback() {<br/>                @Override<br/>                public void onElectedAsMaster(ClusterState state) {<br/>                    joinThreadControl.markThreadAsDone(currentThread);<br/>                    nodesFD.updateNodesAndPing(state); // start the nodes FD<br/>                }<br/>                @Override<br/>                public void onFailure(Throwable t) {<br/>                    logger.trace("failed while waiting for nodes to join, rejoining", t);<br/>                    joinThreadControl.markThreadAsDoneAndStartNew(currentThread);<br/>                }<br/>            }<br/>    );

As master, the node runs error detection, removes dead members, and publishes the updated cluster state, prompting shard reallocation and data replication.

6. Local node is not master

Non‑master nodes block other nodes from joining, vote for a master, and monitor the master via MasterFaultDetection. If the master becomes unreachable or the cluster detects many nodes cannot contact the master, a new election is triggered.

The article concludes with a visual flowchart and emphasizes the importance of proper configuration to avoid split‑brain and ensure stable master election in Elasticsearch clusters.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsBig DataElasticsearchCluster ManagementMaster Election
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.