Understanding Elasticsearch Segment Merging: When and How to Use Force Merge
This article explains what Elasticsearch segments are, why they are immutable, how segment merging works, its impact on resources and search performance, and provides practical configuration tips such as using force_merge, refresh_interval adjustments, and thread count settings.
Background
The user wants to merge read‑only index segments in Elasticsearch 6.7+ and asks three questions:
Is it best to force the index into a single segment ( max_num_segments=1)?
Will a force‑merge consume all node resources and make the service unavailable?
Which index.merge settings should be tuned for Elasticsearch 6.7 and later?
POST /my_index/_forcemerge?max_num_segments=1What is a segment?
In an Elasticsearch cluster each node holds one or more indices. An index is split into primary (and replica) shards; each shard is a Lucene index instance. A shard stores its data in multiple segments , which are immutable inverted‑index files. During a search the results from all segments of a shard are merged to produce the final shard result.
Why are segments immutable?
Lucene writes new documents to a brand‑new segment and never modifies existing segment files. This design avoids costly file‑level updates, enables high indexing throughput, and makes each segment read‑only on disk. Subsequent writes always create additional segments.
What is segment merging?
Elasticsearch’s refresh_interval (default 1 s) creates a new segment after each refresh. Over time many small segments accumulate, which leads to:
Higher resource consumption: each segment holds file handles, memory buffers, and CPU cycles.
Slower search: every query must visit every segment.
The background merge process selects a few similarly sized segments and rewrites them into a larger segment, discarding deleted documents in the process.
What happens during a merge?
Deleted or superseded documents are omitted from the new segment, freeing space.
The merge runs concurrently with indexing and searching; it does not block either operation.
Benefits of merging
Fewer segments improve query latency and reduce memory/file‑handle pressure.
Physical storage size shrinks because deleted document markers are removed.
Potential drawbacks
Merge operations generate heavy disk I/O, which can affect overall node performance.
On machines with limited I/O bandwidth, merges may become a noticeable bottleneck.
Force merge size (Answer 1)
The deprecated optimize API performed the same operation as the current _forcemerge API: it forces the index to contain at most max_num_segments segments, typically one, to obtain the best possible search performance.
Resource impact of force merge (Answer 2)
Official guidance recommends running _forcemerge only after indexing has stopped. A force merge can create very large segments (often > 5 GB). While such large segments exist:
Disk I/O spikes because the node rewrites many small segments into a few huge ones.
If indexing continues, the automatic merge policy will ignore those huge segments until they become mostly deleted, causing prolonged high disk usage and degraded search speed.
In extreme cases the merge can saturate all I/O capacity on a node, making the cluster unresponsive. Best practice is to schedule force merges during low‑traffic windows (e.g., overnight) and, if possible, move the target index to a dedicated node before merging.
Recommended index‑merge settings (Answer 3)
refresh_interval : Increase from the default 1s to a larger value (e.g., 30s) when real‑time freshness is not required. Fewer refreshes mean fewer new segments.
index.merge.scheduler.max_thread_count : Set based on the number of CPU cores (default is the number of available processors). Reducing the thread count can limit I/O contention on busy nodes.
index.merge.policy.max_merge_at_once and index.merge.policy.segments_per_tier : Adjust to control how many segments are merged in a single operation and how many tiers of segment sizes are allowed. Tuning these values can balance merge speed against resource usage.
For a complete list of merge‑related parameters, see the Elasticsearch merge module documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html
Additional performance tips, including indexing best practices and segment management, are available in the official Elasticsearch guide:
https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html#segments-and-merging
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
