Operations 14 min read

Mastering Elasticsearch Cluster Planning, Configuration, and Monitoring

This article, based on Xu Peng’s Gdevops 2017 talk, details the rationale for choosing Elasticsearch, outlines the overall architecture, provides step‑by‑step OS, JVM, and index parameter settings, and explains comprehensive monitoring strategies to ensure high‑availability and performance of large‑scale ES clusters.

dbaplus Community
dbaplus Community
dbaplus Community
Mastering Elasticsearch Cluster Planning, Configuration, and Monitoring

1. Overall Architecture

Elasticsearch is chosen as the search engine because it enables fast, scalable querying of massive datasets. The architecture separates data ingestion, processing, and storage using Kafka for decoupling, ETL pipelines, and tiered storage: cold data in HDFS, warm data in databases or caches, and hot data directly indexed in Elasticsearch.

2. Cluster Planning

The cluster consists of three layers: a query entry node (no data), data nodes that store and search the indices, and master nodes that manage metadata such as node information and index settings. The diagram (ES 5.x) also includes an optional ingest node for preprocessing documents before indexing.

3. Cluster Configuration

3.1 OS Parameter Settings

Key Linux settings include increasing the maximum number of open files to 65535 and tuning virtual memory parameters. Because Elasticsearch uses memory‑mapped files, vm.max_map_count and related vm.dirty_background_ratio / vm.dirty_ratio must be adjusted to control when dirty pages are flushed to disk, preventing long pauses similar to Java GC.

Swap can be disabled (set vm.swappiness=0) or limited to a minimal value to avoid OOM situations while still protecting the kernel.

3.2 Elasticsearch JVM Settings

Regardless of physical RAM, allocate a maximum of 32 GB to the Elasticsearch JVM to avoid the 32‑bit pointer limitation. Enable -XX:ExitOnOutOfMemoryError=1 (requires JDK 1.8.0_92 or newer) so the process terminates cleanly on OOM, allowing external monitors to restart it.

When upgrading the JDK is not possible, add the following option to the JVM launch parameters to kill the process on OOM:

-XX:OnOutOfMemoryError="kill -9 %p"

3.3 Index Parameter Settings

Important index‑level settings include:

refresh_interval : controls how quickly newly indexed documents become searchable; shorter intervals increase I/O, so for heavy bulk loads a larger value (e.g., 90‑100 s) is recommended.

number_of_shards : set based on expected data volume; cannot be changed after index creation, so plan ahead.

number_of_replicas : can be set to 0 during bulk ingestion and increased later.

merge scheduler : adjust thread count based on storage type (default 1 for spinning disks, higher for SSDs).

index.routing.allocation.balance.shard : default 0.5; lowering reduces shard imbalance tolerance.

Segment merging is essential; too many small segments waste file handles and degrade query performance. The flush size can also be increased for large batches.

Dynamic templates can map short strings (< 10 KB) as keyword. Very large fields that are only stored (no search) should be defined as type: object, enabled: false to avoid parsing.

4. Cluster Monitoring

4.1 Monitoring Content

Effective monitoring covers both OS metrics (CPU, memory) and Elasticsearch‑specific metrics (shard distribution, field data memory, index size, query load). Uneven shard allocation or excessive field count can cause performance bottlenecks.

4.2 Monitoring Tools

The team built a custom dashboard called eyeones because the built‑in X‑Pack (ES 5.x) and Marvel (ES 1.x) lacked many useful metrics. The dashboard displays per‑node load, memory usage, number of indices, query rate, and shard recovery status.

Detailed index‑level metrics are also available; clicking an index reveals its specific statistics. The monitoring source code is hosted on GitHub for community contributions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ElasticsearchCluster Configuration
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.