Databases 18 min read

A New Era of Cluster Coordination in Elasticsearch 7.0

Elasticsearch 7.0 replaces Zen Discovery with an automatic, quorum‑based cluster‑coordination subsystem that elects master‑eligible nodes, simplifies bootstrapping via cluster.initial_master_nodes, supports safe rolling upgrades, and provides robust fault tolerance through a consensus protocol similar to Paxos or Raft.

Tencent Cloud Developer

May 7, 2019

A New Era of Cluster Coordination in Elasticsearch 7.0

Elasticsearch is widely popular because it scales easily, allowing small clusters with a few nodes to grow into large clusters with hundreds of nodes while keeping its core distributed coordination stable. Starting with version 7.0, a new cluster coordination subsystem was introduced, offering many advantages over earlier versions.

The coordination subsystem handles tasks that require multiple nodes, such as routing searches to the correct shards, updating replicas when documents are indexed or deleted, and forwarding client requests to appropriate nodes. Each node maintains a view of the cluster state, which describes index mappings, shard allocations, and replica counts. Maintaining consistency of the cluster state is crucial for features like sequence numbers and cross‑cluster replication.

The subsystem elects a master‑eligible node that ensures all nodes receive cluster‑state updates. It must remain robust despite slow nodes, full GC pauses, power loss, network partitions, packet loss, high latency, and message reordering, guaranteeing a consistent view even under these adverse conditions.

A “legal node count” (quorum) is required for state updates. It is recommended to have three master‑eligible nodes so that the loss of a single node does not compromise availability. Fewer than three nodes cannot safely tolerate any loss; more than three may increase election time.

Previously, Elasticsearch used Zen Discovery (discovery.zen.*) for coordination. Starting with 7.0, the new subsystem removes the discovery.zen.minimum_master_nodes setting and introduces an automatic voting configuration that the system manages.

Key configuration changes include: cluster.initial_master_nodes – specifies the node names or IPs for the initial master‑eligible nodes during the first bootstrapping of a brand‑new cluster.

Voting configuration – a set of current master‑eligible nodes; updates to the cluster state require a majority of this set.

Example log when a node cannot discover a master:

master not discovered yet, this node has not previously joined a (v7+) cluster, and cluster.initial_master_nodes is empty on this node

Adding or removing master‑eligible nodes can be done safely via the API without manual reconfiguration, as the system updates the voting configuration automatically.

Upgrade paths from 6.x to 7.0 include rolling upgrades (recommended) and full‑restart upgrades. Rolling upgrades require first upgrading to 6.7, after which the system automatically handles the initial bootstrapping using the existing minimum_master_nodes setting. Full‑restart upgrades require setting cluster.initial_master_nodes to all master‑eligible nodes before restarting.

Several legacy settings have been renamed (e.g., discovery.zen.ping.unicast.hosts → discovery.seed_hosts, discovery.zen.hosts_provider → discovery.seed_providers, discovery.zen.no_master_block → cluster.no_master_block).

Fault‑detection settings under discovery.zen.fd.* are deprecated; users should rely on the default cluster.fault_detection.* settings in 7.0 and later.

Safety improvements prevent automatic recovery when a majority of master‑eligible nodes are lost. In such cases, administrators must manually bring back nodes, restore from snapshots, or use the elasticsearch-node unsafe recovery tool as a last resort.

The new subsystem implements a distributed consensus protocol similar to Paxos, Raft, and Viewstamped Replication, using concepts like a “master term”. It includes an active layer that quickly elects a new master after failures, incremental cluster‑state publishing, and sophisticated election scheduling to avoid split‑brain scenarios.

Why not adopt Raft directly? The Elasticsearch team found that Raft’s log‑centric design does not align well with Elasticsearch’s state‑centric coordination, that Raft’s scaling and reconfiguration mechanisms would require extensive custom work, and that maintaining existing safety checks and rolling‑upgrade capabilities would be difficult.

Summary: Elasticsearch 7.0 introduces a faster, safer, and easier‑to‑use cluster coordination subsystem that supports rolling upgrades from 6.7, provides robust fault tolerance, and offers a solid foundation for future distributed features.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch upgrade distributed consensus 7.0 Cluster Coordination

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.