Databases 20 min read

Understanding Elasticsearch Architecture: Clusters, Shards, Discovery, and Scaling

This article provides a comprehensive overview of Elasticsearch 2.x, covering its distributed architecture, core concepts such as clusters, nodes, indices, shards and replicas, the ZenDiscovery master‑election process, scaling mechanisms, recovery, query features, and the underlying system components like Guice, Netty, and thread‑pool designs.

Architecture Digest

Aug 8, 2016

Understanding Elasticsearch Architecture: Clusters, Shards, Discovery, and Scaling

Elasticsearch is an open‑source system that combines search‑engine capabilities with NoSQL database features, built on Java and Lucene. The article analyses its architecture based on the latest stable 2.x release.

The name reflects its elastic, distributed nature: nodes form a cluster, and Lucene provides the underlying storage engine while Elasticsearch adds indexing, querying, and distributed interfaces.

Key concepts :

Cluster – a group of nodes sharing a common cluster name.

Node – an individual Elasticsearch instance.

Index – a logical database; a cluster can contain many indices.

Primary shard – a subset of an index that is stored on a node.

Replica shard – one or more copies of a primary shard.

Type – analogous to a table; an index may contain multiple types.

Mapping – the schema for a type, automatically created when needed.

Document – a row in a type.

Field – a column in a document.

Allocation – the process of assigning shards to nodes.

Distributed system and ZenDiscovery : Elasticsearch implements its own service‑discovery and master‑election mechanism called ZenDiscovery, avoiding external tools like Zookeeper. Nodes ping each other, exchange master candidates, and elect the master based on lexical ordering of node IDs, respecting the discovery.zen.minimum_master_nodes quorum to prevent split‑brain scenarios.

Elastic scaling : Nodes can join or leave easily via the discovery mechanism and allocation API. Configuration requires only one reachable host (e.g., discovery.zen.ping.unicast.hosts). When a node exits, its shards are reallocated automatically.

Shards and replicas : An index is divided into a fixed number of primary shards (default 5) that cannot be split after creation; additional replicas provide redundancy. Shard routing uses a hash of the document ID (MurmurHash3 by default) and can be influenced by the routing parameter. Primary and replica shards of the same index are never placed on the same node.

Recovery and fault tolerance : If a node fails, the master promotes replica shards to primaries, turning the cluster state yellow. After a configurable timeout ( index.unassigned.node_left.delayed_timeout), replicas are reassigned and data is synchronized via the translog. If a shard has no replica, the cluster state becomes red, indicating possible data loss. Static settings such as gateway.recover_after_nodes, gateway.expected_nodes, and gateway.recover_after_time control recovery behavior on full cluster restarts.

Search engine extensions : Elasticsearch supports Groovy scripts for custom scoring, an automatically generated _all field for cross‑field search, and a suggester for autocomplete and spell‑checking.

NoSQL database features :

Raw data is stored in the index and can be retrieved directly.

The translog guarantees durability and near‑real‑time reads, even if the index buffer has not been flushed.

Dynamic mapping creates schemas on‑the‑fly, offering a schema‑free experience while still allowing optimisations.

Index templates enable automatic configuration of new indices (e.g., daily log indices).

Rich Query DSL and aggregations provide SQL‑like querying and powerful analytics.

System architecture : Dependency injection is handled by Guice, networking by Netty, and both HTTP REST and internal RPC protocols are exposed. The ClusterState object stores the entire cluster metadata and is diff‑updated across nodes.

REST and RPC flow : An HTTP request arrives on port 9200, is routed to a RestAction, which converts it to an RPC TransportRequest and forwards it via NodeClient. This separation allows independent scaling of HTTP and data nodes.

Network layer : Elasticsearch abstracts communication through a Transport layer with multiple channel types (recovery, bulk, regular, state, ping) to prioritize traffic and avoid congestion.

Thread‑pool design : Four pool types are provided – CACHED (unbounded, no queue), DIRECT (runs in caller thread), FIXED (bounded with queue), and SCALING (grows to a max before queuing). Pools are assigned based on operation priority, preventing thread‑pool explosion.

Brain‑storm ideas (not fully analysed): shard‑splitting, built‑in MapReduce, multimedia (voice/image) search plugins, and replacing the current pools with ForkJoinPool and CompletableFuture.

Open‑source product insights : Elasticsearch emerged despite early competition from Solr, gaining traction through a simple distributed model, a vibrant ecosystem, and a shift from site search to log/monitoring (ELK) use cases. It now competes with Solr for search, with Mongo for NoSQL document storage, and leads in time‑series log analytics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Search Engine Elasticsearch Sharding NoSQL scaling cluster management

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.