Master Elasticsearch: From Basics to Advanced Performance Tuning
This article walks through Elasticsearch’s licensing history, version selection, installation, cluster health monitoring, shard routing, storage mechanisms, refresh and translog processes, segment merging, and practical performance optimizations such as disk choices, index settings, and JVM tuning.
Elasticsearch Overview
On January 15, 2021, Elastic CEO Shay Bannon announced a change of the open‑source license for Elasticsearch and Kibana from Apache 2.0 to SSPL and the Elastic License. After three years, Elasticsearch and Kibana will return to open source with AGPL as an additional option alongside ELv2 and SSPL.
1. Basic Usage
When choosing a version, the commonly used stable major releases are 2.x, 5.x, 6.x, and 7.x (current). Versions 3.x and 4.x were skipped to keep the ELK stack (Elasticsearch, Logstash, Kibana) versioned consistently.
Elasticsearch is built with Java, so the JDK version must also match the Elasticsearch version; for example, 7.2 supports JDK 11.
Installation
Download and unzip Elasticsearch, then start it with
bin/elasticsearch. It runs on port 9200 by default; accessing
http://localhost:9200returns a JSON object with node, cluster, and version information.
<code>{
"name" : "U7fp3O9",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "-Rj8jGQvRIelGd9ckicUOA",
"version" : {
"number" : "6.8.1",
"build_flavor" : "default",
"build_type" : "zip",
"build_hash" : "1fad4e1",
"build_date" : "2019-06-18T13:16:52.517138Z",
"build_snapshot" : false,
"lucene_version" : "7.7.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}</code>Cluster Health
Cluster health can be checked via Kibana or APIs such as
GET/_cluster/health, which returns a JSON status (green, yellow, red) and node statistics.
<code>{
"cluster_name" : "lzj",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 9,
"active_shards" : 9,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 64.28571428571429
}</code>Health colors:
Green : all primary and replica shards are active.
Yellow : all primary shards are active but at least one replica is not.
Red : some primary shards are missing, leading to potential data loss.
2. Elasticsearch Mechanisms
When creating an index you must decide the number of primary shards; this number never changes because shard routing depends on it.
Shard Routing
Routing determines the target primary shard using the formula:
<code>shard = hash(routing) % number_of_primary_shards</code>The default routing value is the document _id , but it can be customized. The coordinating node calculates the target shard and forwards the request to the appropriate primary shard, which then replicates to its replicas.
Storage Model
Index data is stored in immutable segments on disk. Each segment is a self‑contained inverted index. Segments are written to a translog first, then flushed to disk as new segments.
<code>path.data: /path/to/data // index data
path.logs: /path/to/logs // log files</code>Segments are never modified in place; deletions are recorded in a
.delfile, and updates are performed as delete‑plus‑add.
Refresh and Translog
Elasticsearch refreshes each shard roughly every second, making newly indexed documents searchable within a second. The refresh creates a new segment in the file‑system cache without writing to disk immediately.
Manual refresh can be triggered:
<code>POST/_refresh // refresh all indices
POST/nba/_refresh // refresh a specific index</code>Tip: Frequent manual refreshes can impact performance; use them sparingly in production.
To avoid data loss, Elasticsearch writes every operation to a transaction log (translog) before it is persisted to a segment. When the translog reaches 512 MB or 30 minutes, a flush occurs, writing the in‑memory data to a new segment, syncing to disk, and clearing the translog.
Segment Merging
Because each refresh creates a new segment, the number of segments can grow quickly. Background merge processes combine small segments into larger ones, discarding deleted documents and reducing file‑handle and CPU overhead.
3. Performance Optimization
Storage Devices
Use SSDs for lower latency.
Prefer RAID10/RAID5 for better I/O and reliability.
Avoid remote mounts like NFS or SMB.
On cloud instances, be cautious with EBS performance.
Index Settings
Use sequential, compressible IDs instead of random UUIDs.
Disable doc values on fields that don’t need sorting or aggregations.
Prefer
keywordover
textfor exact‑match fields.
Increase
index.refresh_interval(e.g., to
30s) if near‑real‑time visibility isn’t required.
During bulk loads, set
index.refresh_interval=-1and
index.number_of_replicas=0, then restore them after the load.
Use
scrollfor deep pagination instead of large
from+sizequeries.
Specify routing values to target specific shards when possible.
JVM Tuning
Set the heap min (
-Xms) and max (
-Xmx) to the same value, not exceeding 50 % of physical RAM and 32 GB.
Consider using the G1 garbage collector instead of CMS.
Allocate sufficient memory for the filesystem cache to speed up searches.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.