Big Data 14 min read

Elasticsearch Cluster Architecture and Data Layer Design

This article explains Elasticsearch's cluster architecture, including nodes, indices, shards, and replicas, compares mixed and tiered deployment models, discusses the data storage layer and replication trade‑offs, and presents two typical distributed data system designs with their advantages and drawbacks.

Top Architect
Top Architect
Top Architect
Elasticsearch Cluster Architecture and Data Layer Design

Elasticsearch Cluster Architecture

Elasticsearch is a widely used open‑source search and analytics system that serves three main scenarios: full‑text search, JSON document storage, and time‑series data analysis. Its core concepts include Nodes (running instances), Indices (logical collections of mappings and inverted files), Shards (data partitions managed by nodes), and Replicas (copies of shards for reliability and read scaling).

Index Process

When indexing a document, it is first routed to the primary shard, indexed there, and then replicated to its replica shards; the operation is considered successful only after all replicas have persisted the data.

Role Deployment Methods

Elasticsearch supports two deployment styles:

Mixed Deployment (left diagram) : Data and Transport roles coexist on the same node, simplifying setup but causing resource contention and scaling limits, especially in large clusters.

Tiered Deployment (right diagram) : Transport nodes handle request routing and result merging, while dedicated Data nodes store and process data, offering isolation, better scalability, and hot‑update capability.

Elasticsearch Data Layer Architecture

Indices and metadata are stored on the local file system using various loading strategies (niofs, mmap, simplefs, smb). By default Elasticsearch selects the optimal method, but users can configure it manually.

Replica (Copy) Mechanism

Each index can be configured with a replica count. Replicas provide three main benefits: service availability during node failures, data reliability against hardware loss, and increased query capacity by distributing read traffic.

Problems of Replication

Additional cost and resource waste when replicas are not needed for performance.

Write throughput reduction because every write must be propagated to primary and all replicas.

Slow scaling when dynamically adding replicas, as full data copy is required.

Distributed System Types

1. Local File‑System Based Distributed System

Data is stored on each node’s local disks. Shards (primary + replica) are placed on different machines. Failure of a node triggers primary‑replica election and data copy to a new node, which can be time‑consuming for large datasets.

2. Distributed File‑System (Shared Storage) Based System

Computation and storage are separated: shards contain only compute logic and reference data stored in a shared distributed file system (e.g., HDFS). This enables elastic scaling of compute and storage independently, reduces data copy overhead, but may suffer from lower I/O performance compared to local disks.

Conclusion

Both architectures have distinct strengths and weaknesses; choosing the appropriate model depends on workload characteristics, cost considerations, and scalability requirements. The article provides a foundational overview of distributed data system designs, focusing on Elasticsearch as a concrete example.

Distributed SystemsBig DataElasticsearchshardingdata replicationCluster Architecture
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.