Databases 13 min read

Elasticsearch Index Performance Optimization (Part 2)

To maximize Elasticsearch bulk-indexing speed, temporarily disable refreshes and replicas, tune merge throttling and scheduler threads, enlarge translog and index buffer thresholds, and adjust indexing and bulk thread-pool sizes, then restore defaults after the load completes.

vivo Internet Technology

Oct 14, 2017

Elasticsearch Index Performance Optimization (Part 2)

This article, translated from the QBox official blog, is the second part of a three‑part series on maximizing Elasticsearch indexing performance. It focuses on configuration settings that improve indexing throughput and reduce management overhead.

Refresh Interval

The index.refresh_interval setting controls how often Elasticsearch refreshes a shard so that newly indexed documents become searchable. The default is 1s. Increasing this interval (or setting it to -1 to disable refresh temporarily) reduces the costly refresh operation and can dramatically boost bulk indexing speed. Example:

curl -XPUT 'localhost:9200/test/_settings' -d '{<br/>  "index" : {<br/>    "refresh_interval" : "-1"<br/>  }<br/>}'

After bulk indexing, the setting should be restored, e.g.:

curl -XPUT 'localhost:9200/my_index/_settings' -d '{<br/>  "index" : {<br/>    "refresh_interval" : "1s"<br/>  }<br/>}'

Replica Count

During large bulk imports, setting index.number_of_replicas to 0 avoids the overhead of indexing documents on replica shards. Once the import finishes, replicas can be re‑enabled.

curl -XPUT 'localhost:9200/my_index/_settings' -d '{<br/>  "index" : {<br/>    "number_of_replicas" : 0<br/>  }<br/>}'

Segment Merging and Throttling

Elasticsearch merges Lucene segments in the background, which is I/O‑intensive. The default merge throttling rate is 20 MB/s for HDDs; for SSDs a higher limit (100–200 MB/s) is advisable. Throttling can be disabled entirely with:

curl -XPUT 'localhost:9200/_cluster/settings' -d '{<br/>  "transient" : {<br/>    "indices.store.throttle.type" : "none"<br/>  }<br/>}'

After the bulk operation, restore the setting to merge.

curl -XPUT 'localhost:9200/_cluster/settings' -d '{<br/>  "transient" : {<br/>    "indices.store.throttle.type" : "merge"<br/>  }<br/>}'

Merge Scheduler Thread Count

The index.merge.scheduler.max_thread_count controls how many threads may run merges concurrently. The default is max(1, min(4, availableProcessors/2)). For HDD‑based nodes it is often set to 1 (allowing three threads total). Example for a single index:

curl -XPUT 'localhost:9200/my_index/_settings' -d '{<br/>  "index.merge.scheduler.max_thread_count" : 1<br/>}'

For all indices:

curl -XPUT 'localhost:9200/_settings' -d '{<br/>  "index.merge.scheduler.max_thread_count" : 1<br/>}'

Translog (Transaction Log) Management

Flushing the translog forces a Lucene commit, which is expensive. Settings that influence flush behavior include: index.translog.flush_threshold_size (default 512 MB) index.translog.flush_threshold_ops (default unlimited) index.translog.flush_threshold_period (default 30 min) index.translog.interval (default 5 s)

Increasing flush_threshold_size (e.g., to 1 GB) reduces the frequency of flushes and can improve indexing throughput, provided sufficient heap memory is available.

Index Buffer Size

The index buffer stores newly indexed documents before they are written to disk. Its size is controlled by static settings that must be configured on every data node: indices.memory.index_buffer_size – default 10 % of JVM heap indices.memory.min_index_buffer_size – default 48 MB indices.memory.max_index_buffer_size – unlimited by default

Increasing the buffer size can be beneficial for heavy indexing workloads.

Thread Pools for Indexing and Bulk Operations

Elasticsearch provides dedicated thread pools: index – fixed size, default = number of CPU cores, queue size 200 bulk – fixed size, default = number of CPU cores, queue size 50

Adjusting these pools (and the index.index_concurrency setting that limits concurrent indexing per shard) can further improve performance, especially on nodes dedicated to a single shard.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch performance tuning thread pool Refresh Interval Segment Merging translog

Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.