Mastering Elasticsearch: Core Concepts, Architecture, and Performance Tips
This comprehensive guide explains what Elasticsearch does, its underlying Lucene technology, core concepts such as clusters, shards, replicas, mapping, indexing and storage mechanisms, and provides practical performance‑tuning advice for building and operating a robust distributed search engine.
Data in Everyday Life
Data can be divided into structured (row‑based, stored in relational databases) and unstructured (full‑text, documents, images, video, etc.). Correspondingly, searches are either structured‑data search or unstructured‑data (full‑text) search.
From Lucene to Elasticsearch
Lucene is an open‑source library that provides inverted‑index based full‑text search. Elasticsearch builds on Lucene, adding a RESTful API, distributed capabilities, and easy installation. Solr is another Lucene‑based engine, but Elasticsearch has native clustering.
Inverted Index Basics
An inverted index lists each unique term and the documents in which it appears. Example:
Term Doc_1 Doc_2 Doc_3
--------------------------------
Java X
is X X X
the X X X
best X X X
programming X X X
language X X X
PHP X
Javascript XKey terms: Term , Term Dictionary , Post List , Inverted File .
Elasticsearch Core Concepts
A distributed, near‑real‑time document store where every field can be indexed and searched.
Scalable to hundreds of nodes and petabytes of data.
Cluster
A cluster consists of one or more nodes sharing the same cluster.name. Nodes can be master‑eligible, data, or coordinating. Zen Discovery handles node discovery and master election.
Discovery Mechanism
Zen Discovery uses unicast or file‑based discovery. The discovery.zen.ping.unicast.hosts setting lists seed hosts.
Node Roles
Nodes can be master‑eligible ( node.master: true) and/or data nodes ( node.data: true). Separating these roles improves stability.
Split‑Brain
Network partitions can cause multiple masters. A quorum (configured via discovery.zen.minimum_master_nodes) mitigates this.
Shards and Replicas
Indexes are horizontally split into primary shards; each primary can have replica shards for high availability. Shard count is fixed at index creation.
Mapping
Mapping defines field types (e.g., text, keyword, date) and analysis. You can use dynamic mapping or explicit mapping when creating an index.
Basic Usage
Download, unzip, and start Elasticsearch with bin/elasticsearch. The REST API listens on port 9200.
{
"name": "node1",
"cluster_name": "elasticsearch",
"version": { "number": "6.8.1" },
"tagline": "You Know, for Search"
}Check cluster health via GET /_cluster/health, which returns green, yellow, or red.
Write Path
Documents are routed to a primary shard using shard = hash(routing) % number_of_primary_shards. The coordinating node forwards the request to the primary, which writes to disk and replicates to its replicas.
Storage Mechanics
Data is stored in immutable segments on disk. Segments are written to a translog first, then refreshed (default every second) to make them searchable, and finally flushed to create a commit point.
Refresh and Flush
Refresh creates a new segment in the file‑system cache; Flush writes segments and translog to disk when the translog reaches 512 MB or 30 minutes.
Segment Merging
Background merges combine small segments into larger ones, reclaiming space from deleted documents.
Performance Optimizations
Use SSDs and avoid remote mounts.
Configure multiple path.data directories for striping.
Compress term dictionaries with FST.
Set appropriate index.refresh_interval and number_of_replicas during bulk indexing.
Prefer keyword over text when analysis isn’t needed.
Use routing values to target specific shards.
JVM Tuning
Set Xms and Xmx to the same value (≤ 50 % of RAM, ≤ 32 GB). Consider G1 GC and ensure enough heap for caching.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
