Elasticsearch Cluster Architecture and Distributed Data System Design
This article explains Elasticsearch's cluster architecture, including nodes, indices, shards, replicas, deployment models, and data layer storage, and compares two types of distributed data system designs—local file‑system based and shared‑storage based—highlighting their advantages and trade‑offs.
Elasticsearch Cluster Architecture
Elasticsearch is a widely used open‑source search and analytics system that serves three main domains: search, JSON document storage, and time‑series data analysis.
Key concepts include:
Node: a running Elasticsearch instance, typically a process on a machine.
Index: a logical collection of documents, consisting of mappings and inverted index files.
Shard: a partition of an index that is managed by a node; shards can be primary or replica.
Replica: a copy of a shard that ensures data reliability and improves read scalability.
When indexing a document, it is first routed to the primary shard, indexed there, then replicated to its replica shards before the operation returns success.
Replica shards provide fault tolerance and enable query load‑balancing.
Role Deployment Models
Elasticsearch supports two deployment styles:
Mixed deployment (default): Data and Transport roles run on the same node, simplifying setup but causing resource contention and scaling limits.
Tiered deployment: Separate Transport nodes handle request routing and result merging, while dedicated Data nodes store and process data, offering better isolation, scalability, and hot‑upgrade capabilities.
Elasticsearch Data Layer Architecture
Indices and metadata are stored on the local file system using various loading methods (niofs, mmap, simplefs, smb), with mmap offering the best performance.
Replication is used to avoid data loss and to increase query capacity; each index can configure a replica count, creating additional shard copies distributed across nodes.
Replica Benefits and Drawbacks
Ensures service availability by routing traffic to remaining replicas during failures.
Improves data reliability; without replicas, a primary node failure would cause data loss.
Boosts query throughput by adding more replicas.
However, replicas increase resource consumption, reduce write performance, and make dynamic scaling slower.
Distributed System Types
1. Local File‑System Based Distributed System
Data resides on each node’s local disks; shards have primary and replica copies. Node failures require replica promotion and data copying, which can be time‑consuming.
2. Shared‑Storage Based Distributed System
Storage and compute are separated: nodes only run computation, while data lives in a distributed file system (e.g., HDFS). This enables elastic scaling of compute and storage independently, reduces data‑copy overhead, but may suffer from lower storage access performance.
Summary
Both architectures have distinct strengths and weaknesses; choosing the appropriate model depends on specific workload requirements and trade‑offs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
