Elasticsearch Cluster Architecture: Nodes, Shards, and Deployment Options
This article explains the core concepts of Elasticsearch’s distributed architecture—including nodes, indices, shards, replicas—and compares mixed and tiered deployment models, while also discussing data storage strategies, replica benefits, and the trade‑offs of local‑file versus shared‑storage distributed systems.
Elasticsearch Cluster Architecture
Elasticsearch is a widely used open‑source search and analytics system. It is applied in search, JSON document storage, and time‑series data analysis. The article first introduces key concepts of an Elasticsearch cluster.
Node : a running Elasticsearch instance, usually a process on a machine.
Index : a logical collection that includes mapping and inverted index files; data may be spread across one or many machines.
Shard : a partition of an index to support large data volumes; each shard is managed by a node. Shards can be primary or replica.
Replica : a copy of a shard that ensures strong or eventual consistency.
Index Process
When creating an index, a document is routed to the primary shard, indexed there, then sent to replica shards. The operation succeeds only after all replicas have indexed the document.
If a primary or replica shard is lost, the system re‑elects a new primary and copies data from existing replicas to a new shard on another node, during which the cluster operates in a degraded state.
Replicas improve data reliability, enable failover, and increase query capacity by distributing read traffic.
Role Deployment Methods
Elasticsearch supports two deployment styles:
Mixed Deployment (left diagram)
Default mode where data and transport roles share the same node.
Requests are routed randomly to any node, which holds a global routing table to forward the request to appropriate data nodes.
Simple to start—one node can provide all functions—but request types can interfere with each other, and the number of connections grows quadratically with node count, limiting cluster size.
Hot updates are not supported.
Tiered Deployment (right diagram)
Roles are isolated via configuration: dedicated transport nodes handle request forwarding and result merging, while data nodes focus on data processing.
Advantages: independent roles avoid mutual interference, transport traffic is evenly distributed, and data‑node failures affect only data processing.
Scalability: many data nodes can be added without increasing transport‑node connections, and transport nodes can be grouped to connect to specific data‑node sets, solving the connection‑limit issue.
Supports hot updates: data nodes can be upgraded one by one, followed by transport nodes, allowing seamless upgrades.
Elasticsearch Data Layer Architecture
Data Storage
Indexes and metadata are stored on the local file system, with loading options such as niofs, mmap, simplefs, and smb. The mmap method, which locks indexes in memory, offers the best performance. Elasticsearch automatically selects a loading method but can be configured manually.
Because data resides locally, node or disk failures can cause data loss; replicas are used to mitigate this risk.
Replica
Each index can configure a replica count. For example, a replica count of 2 results in three shards: one primary and two replicas, which the master node distributes across different machines or racks.
Ensures service availability: if a replica becomes unavailable, traffic is redirected to remaining replicas.
Guarantees data reliability: without replicas, a primary node failure would cause data loss.
Boosts query capacity: adding replicas linearly increases concurrent query handling.
Issues
Replica introduces cost overhead when extra shards are unnecessary.
Write performance degrades because each write must be propagated to primary and then to all replicas.
Scaling replicas is slow; adding a replica requires full data copy from existing shards.
Distributed Systems
Type 1: Local File‑System Based Distributed System
Each shard (primary + replica) stores data locally. If a node fails, the system elects a new primary from replicas and copies the missing shard to a new node, which can be time‑consuming for large datasets.
Type 2: Distributed File‑System Based Distributed System (Shared Storage)
This architecture separates storage and computation. Shards contain only computation logic and reference data stored in a shared distributed file system (e.g., HDFS). When a node fails, a new shard can quickly reconnect to the shared storage, reducing recovery time.
Advantages include elastic resource scaling, finer‑grained management, and better hotspot handling. The main drawback is that accessing a distributed file system can be slower than local disk access, though modern user‑space protocols have narrowed this gap.
Summary
Both architectures have trade‑offs; choosing the right one depends on specific requirements such as reliability, performance, and operational complexity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
