Big Data 15 min read

How Elasticsearch’s Cluster Architecture Powers Scalable Search and Analytics

This article explains Elasticsearch’s distributed cluster design, covering core concepts such as nodes, indices, shards, and replicas, compares mixed and tiered deployment models, examines data‑layer storage options, and discusses two typical distributed system architectures with their trade‑offs.

IT Architects Alliance

Mar 23, 2022

How Elasticsearch’s Cluster Architecture Powers Scalable Search and Analytics

Core Concepts

Node : a running Elasticsearch process, typically one per machine.

Index : a logical collection of JSON documents; it contains mapping definitions and the inverted/forward index files.

Shard : a partition of an index. Each shard is assigned to a node and can be a primary or a replica .

Replica : a copy of a primary shard that provides redundancy and additional query capacity.

Indexing Flow

When a document is indexed, the routing logic selects the appropriate primary shard. The document is written to the primary shard first; only after the primary acknowledges success does Elasticsearch forward the write to all replica shards. The request returns to the client only after every replica confirms the write, guaranteeing strong consistency.

If a primary or replica shard becomes unavailable, the cluster enters a degraded state. Elasticsearch elects a surviving replica to become the new primary and creates a fresh replica on another node by copying the shard data from the remaining copies.

Role Deployment Models

Mixed (single‑node) Deployment

All roles—master, data, and transport—run in the same JVM process.

Simple to start; a single node can handle indexing, search, and cluster coordination.

Drawbacks: role interference, higher inter‑node connection count (each node maintains 13 TCP connections to every other node), limited horizontal scalability, and no support for rolling upgrades.

Tiered (separate‑role) Deployment

Transport nodes handle request routing, load balancing, and result merging.

Data nodes store shards and execute search/aggregation workloads.

Advantages: isolation of responsibilities reduces contention, connection count scales with the number of transport nodes only, and rolling upgrades are possible (upgrade data nodes first, then transport nodes).

Disadvantages: requires explicit configuration of transport node count and careful capacity planning.

Data‑Layer Architecture

Elasticsearch stores index files and cluster metadata on the local file system. Supported I/O implementations include niofs, mmap, simplefs, and smb. By default Elasticsearch auto‑selects the optimal implementation; mmap (memory‑mapped files) usually offers the best read performance because index segments are locked in RAM.

Replica shards improve reliability and read throughput but increase storage consumption, write latency (each write must be replicated), and the time required to add new replicas (full shard copy is needed).

Distributed System Design Options

1. Local‑disk based architecture

Each shard’s data resides on the node’s local disks. On node failure, Elasticsearch promotes a replica to primary and creates a new replica on another node by copying the full shard data. Example: copying a 200 GB shard over a 1 Gbps network takes roughly 1 600 seconds. Without replicas, the shard would be unavailable for the entire copy duration.

2. Shared distributed file‑system architecture (e.g., HDFS)

Data and compute are separated: shards contain only the computation logic, while the actual index files live in a shared storage system. Nodes attach to the shared files on demand, so a failed node can quickly reconnect to the same data without a full copy. Benefits include elastic scaling of storage and compute, finer‑grained resource management, and faster recovery from hotspots. The main drawback is that remote file‑system I/O can be slower than local disks, although modern user‑space protocols mitigate this gap.

Key Trade‑offs

Replica count adds storage cost; unnecessary replicas waste disk space and CPU.

Every write incurs additional latency because the primary must wait for all replicas to acknowledge.

Increasing replica count in a running cluster triggers full shard copy, which can be time‑consuming for large shards.

Illustrative Diagrams

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Big Data Elasticsearch data storage replica Cluster Architecture Shard

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.