Databases 18 min read

Master Elasticsearch Performance: Practical Tuning Tips for Faster Clusters

This guide consolidates everyday Elasticsearch tuning techniques—covering configuration file tweaks, system‑level settings, and usage‑level optimizations such as memory locking, discovery settings, fault detection, queue sizing, translog handling, bulk indexing, shard management, and disk I/O—to help you build a stable, high‑throughput search cluster.

Java Interview Crash Guide

Jan 9, 2021

Master Elasticsearch Performance: Practical Tuning Tips for Faster Clusters

1. Configuration File Tuning

Memory lock : set bootstrap.memory_lock: true to allow the JVM to lock memory and prevent swapping.

zen.discovery : use unicast discovery in production; configure discovery.zen.ping.unicast.hosts with all node addresses, and adjust discovery.zen.ping_timeout, discovery.zen.join_timeout, and discovery.zen.minimum_master_nodes as needed.

Fault detection : configure discovery.zen.fd.ping_interval (default 1s), discovery.zen.fd.ping_timeout (default 30s), and discovery.zen.fd.ping_retries (default 3) to monitor master and data node health.

Queue size: increase thread_pool queue only when GET /_cat/thread_pool shows persistent queue rejections.

Memory breaker settings (indices.breaker): indices.breaker.total.limit: 50% (default 70% of JVM heap). indices.breaker.request.limit: 10% (default 60% of JVM heap). indices.breaker.fielddata.limit: 10% (default 60% of JVM heap).

Adjust query cache size with indices.queries.cache.size (e.g., 5%).

2. System Level Tuning

JDK version : follow the official recommendation for the matching Elasticsearch version.

JVM heap : set -Xms and -Xmx to the same value, preferably slightly less than half of the physical RAM when the machine has less than 64 GB.

Keep the heap size below 32 GB to avoid pointer‑compression overhead.

Swap : disable swap (e.g., swapoff -a) to prevent performance degradation.

File descriptors : increase the limit (e.g., ulimit -n 65536) because Elasticsearch and Lucene open many files and sockets.

mmap : ensure sufficient virtual memory for memory‑mapped files, e.g., sysctl -w vm.max_map_count=262144 or edit /etc/sysctl.conf accordingly.

Disk : for SSDs use the deadline or noop I/O scheduler; for HDDs the default cfq is appropriate. Mount data directories with options such as noatime,data=writeback,barrier=0,nobh and avoid remote mounts (NFS/SMB).

Prefer RAID 0 for performance, but be aware of the loss of redundancy.

3. Elasticsearch Usage Tuning

Hot threads : run GET /_nodes/hot_threads?interval=30s to identify resource‑heavy threads (e.g., bulk, search, merge).

Pending tasks : use GET /_cluster/pending_tasks to spot metadata‑changing operations that may queue up.

Field storage (choose one based on use case):

doc_values : column‑store, suitable for non‑analyzed fields; saves memory.

fielddata : loads field into heap for fast access; disable if not needed.

_source (storefield): keep original JSON; can be disabled when updates/reindex are unnecessary.

_all : deprecated in 6.x; disable to save space.

norms : scoring metadata; disable for log‑type data.

Translog settings:

Set "index.translog.durability": "async" for higher throughput when occasional data loss is acceptable. index.translog.sync_interval: async fsync interval (default 5 s). index.translog.flush_threshold_size: max size before a flush (default 512 MB).

Refresh interval : default 1 s; set to -1 to disable or increase for write‑heavy workloads.

Dynamic mapping : set dynamic: false or strict to avoid uncontrolled metadata changes.

Bulk indexing : aim for 5–15 MB of raw request size per bulk; monitor with iostat, top, ps. Increase concurrency until a bottleneck appears (e.g., EsRejectedExecutionException).

Shard and index sizing : keep shard size ≤ 50 GB for logs, ≤ 20 GB for business data; use shrink and rollover APIs; name indices by time period (e.g., test-YYYYMMDD).

Segment merge : limit merge threads with index.merge.scheduler.max_thread_count; for SSDs the default is usually fine.

Auto‑generated IDs : avoid specifying IDs unless required; let Elasticsearch generate them for better write performance.

Routing : specify a routing key on write to restrict queries to a single shard, reducing coordination overhead.

Alias : expose an alias instead of the concrete index name to enable seamless reindexing or index swaps.

Avoid wide tables : limit total fields (default index.mapping.total_fields.limit = 1000) to prevent memory exhaustion.

Avoid sparse indexes : sparse fields increase document ID delta encoding and storage size; keep mappings tight.

Source: https://elasticsearch.cn/article/6202

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Indexing scalability Elasticsearch Search Cluster Optimization

Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.