Elasticsearch Index Performance Optimization (Part 2)
To maximize Elasticsearch bulk-indexing speed, temporarily disable refreshes and replicas, tune merge throttling and scheduler threads, enlarge translog and index buffer thresholds, and adjust indexing and bulk thread-pool sizes, then restore defaults after the load completes.
This article, translated from the QBox official blog, is the second part of a three‑part series on maximizing Elasticsearch indexing performance. It focuses on configuration settings that improve indexing throughput and reduce management overhead.
Refresh Interval
The index.refresh_interval setting controls how often Elasticsearch refreshes a shard so that newly indexed documents become searchable. The default is 1s. Increasing this interval (or setting it to -1 to disable refresh temporarily) reduces the costly refresh operation and can dramatically boost bulk indexing speed. Example:
curl -XPUT 'localhost:9200/test/_settings' -d '{<br/> "index" : {<br/> "refresh_interval" : "-1"<br/> }<br/>}'After bulk indexing, the setting should be restored, e.g.:
curl -XPUT 'localhost:9200/my_index/_settings' -d '{<br/> "index" : {<br/> "refresh_interval" : "1s"<br/> }<br/>}'Replica Count
During large bulk imports, setting index.number_of_replicas to 0 avoids the overhead of indexing documents on replica shards. Once the import finishes, replicas can be re‑enabled.
curl -XPUT 'localhost:9200/my_index/_settings' -d '{<br/> "index" : {<br/> "number_of_replicas" : 0<br/> }<br/>}'Segment Merging and Throttling
Elasticsearch merges Lucene segments in the background, which is I/O‑intensive. The default merge throttling rate is 20 MB/s for HDDs; for SSDs a higher limit (100–200 MB/s) is advisable. Throttling can be disabled entirely with:
curl -XPUT 'localhost:9200/_cluster/settings' -d '{<br/> "transient" : {<br/> "indices.store.throttle.type" : "none"<br/> }<br/>}'After the bulk operation, restore the setting to merge.
curl -XPUT 'localhost:9200/_cluster/settings' -d '{<br/> "transient" : {<br/> "indices.store.throttle.type" : "merge"<br/> }<br/>}'Merge Scheduler Thread Count
The index.merge.scheduler.max_thread_count controls how many threads may run merges concurrently. The default is max(1, min(4, availableProcessors/2)). For HDD‑based nodes it is often set to 1 (allowing three threads total). Example for a single index:
curl -XPUT 'localhost:9200/my_index/_settings' -d '{<br/> "index.merge.scheduler.max_thread_count" : 1<br/>}'For all indices:
curl -XPUT 'localhost:9200/_settings' -d '{<br/> "index.merge.scheduler.max_thread_count" : 1<br/>}'Translog (Transaction Log) Management
Flushing the translog forces a Lucene commit, which is expensive. Settings that influence flush behavior include: index.translog.flush_threshold_size (default 512 MB) index.translog.flush_threshold_ops (default unlimited) index.translog.flush_threshold_period (default 30 min) index.translog.interval (default 5 s)
Increasing flush_threshold_size (e.g., to 1 GB) reduces the frequency of flushes and can improve indexing throughput, provided sufficient heap memory is available.
Index Buffer Size
The index buffer stores newly indexed documents before they are written to disk. Its size is controlled by static settings that must be configured on every data node: indices.memory.index_buffer_size – default 10 % of JVM heap indices.memory.min_index_buffer_size – default 48 MB indices.memory.max_index_buffer_size – unlimited by default
Increasing the buffer size can be beneficial for heavy indexing workloads.
Thread Pools for Indexing and Bulk Operations
Elasticsearch provides dedicated thread pools: index – fixed size, default = number of CPU cores, queue size 200 bulk – fixed size, default = number of CPU cores, queue size 50
Adjusting these pools (and the index.index_concurrency setting that limits concurrent indexing per shard) can further improve performance, especially on nodes dedicated to a single shard.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
