Big Data 13 min read

Elasticsearch Cluster Architecture and Distributed Data System Design

This article explains Elasticsearch's cluster architecture, including nodes, indices, shards, replicas, deployment models, and data layer storage, and compares two types of distributed data system designs—local file‑system based and shared‑storage based—highlighting their advantages and trade‑offs.

Architecture Digest

Aug 19, 2019

Elasticsearch Cluster Architecture

Elasticsearch is a widely used open‑source search and analytics system that serves three main domains: search, JSON document storage, and time‑series data analysis.

Key concepts include:

Node: a running Elasticsearch instance, typically a process on a machine.

Index: a logical collection of documents, consisting of mappings and inverted index files.

Shard: a partition of an index that is managed by a node; shards can be primary or replica.

Replica: a copy of a shard that ensures data reliability and improves read scalability.

When indexing a document, it is first routed to the primary shard, indexed there, then replicated to its replica shards before the operation returns success.

Replica shards provide fault tolerance and enable query load‑balancing.

Role Deployment Models

Elasticsearch supports two deployment styles:

Mixed deployment (default): Data and Transport roles run on the same node, simplifying setup but causing resource contention and scaling limits.

Tiered deployment: Separate Transport nodes handle request routing and result merging, while dedicated Data nodes store and process data, offering better isolation, scalability, and hot‑upgrade capabilities.

Elasticsearch Data Layer Architecture

Indices and metadata are stored on the local file system using various loading methods (niofs, mmap, simplefs, smb), with mmap offering the best performance.

Replication is used to avoid data loss and to increase query capacity; each index can configure a replica count, creating additional shard copies distributed across nodes.

Replica Benefits and Drawbacks

Ensures service availability by routing traffic to remaining replicas during failures.

Improves data reliability; without replicas, a primary node failure would cause data loss.

Boosts query throughput by adding more replicas.

However, replicas increase resource consumption, reduce write performance, and make dynamic scaling slower.

Distributed System Types

1. Local File‑System Based Distributed System

Data resides on each node’s local disks; shards have primary and replica copies. Node failures require replica promotion and data copying, which can be time‑consuming.

2. Shared‑Storage Based Distributed System

Storage and compute are separated: nodes only run computation, while data lives in a distributed file system (e.g., HDFS). This enables elastic scaling of compute and storage independently, reduces data‑copy overhead, but may suffer from lower storage access performance.

Summary

Both architectures have distinct strengths and weaknesses; choosing the appropriate model depends on specific workload requirements and trade‑offs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Elasticsearch replication data storage Cluster Architecture

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.