Backend Development 26 min read

Boost Elasticsearch Performance: 24 Proven Tips and Pitfalls to Avoid

This article compiles practical Elasticsearch recommendations—covering query caching, aggregation design, pagination strategies, scripting, mapping choices, shard and replica settings, and bulk indexing—to help engineers write faster, more reliable search queries while avoiding common performance traps.

dbaplus Community

Dec 27, 2023

Boost Elasticsearch Performance: 24 Proven Tips and Pitfalls to Avoid

Introduction

The author shares practical Elasticsearch advice, explaining the rationale behind each recommendation rather than merely listing conclusions, and invites feedback on any inaccuracies.

Query‑related Recommendations

1. Leverage Shard Request Cache

Implemented in IndicesRequestCache, it caches the entire client request per shard. It is effective mainly for aggregations, hits.total, and suggestions. The cache is only used when the request includes size=0. Requests with scroll, profiling, non‑ QUERY_THEN_FETCH types, or requestCache=false are excluded. Cache is invalidated after a segment refresh.

2. Use Node Query/Filter Cache

Implemented in LRUQueryCache (default enabled). It caches filter sub‑queries per segment. Only filters on segments larger than 10 000 docs and occupying >3 % of the shard are cached. Cache is cleared when a segment is merged.

3. Prefer Filter Context over Query Context

Filters do not score, while must does.

Filters are cacheable, improving performance.

// Create BoolQueryBuilder
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
// Use filter context
boolQuery.filter(QueryBuilders.termQuery("field", "value"));

4. Set size=0 for aggregation‑only queries

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.aggregation(AggregationBuilders.terms("term_agg").field("field")
    .subAggregation(AggregationBuilders.sum("sum_agg").field("field")));
sourceBuilder.size(0);

5. Use absolute dates instead of now

LocalDateTime now = LocalDateTime.now();
String currentDate = now.format(DateTimeFormatter.ISO_DATE);
sourceBuilder.query(QueryBuilders.rangeQuery("date_field").gte("2022-01-01").lte(currentDate));

6. Avoid deep nested aggregations

Each intermediate and final aggregation result lives in memory; excessive nesting can exhaust memory.

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchAllQuery());
TermsAggregationBuilder termAggBuilder1 = AggregationBuilders.terms("term_agg1").field("field_name1");
TermsAggregationBuilder termAggBuilder2 = AggregationBuilders.terms("term_agg2").field("field_name2");
termAggBuilder1.subAggregation(termAggBuilder2);
TermsAggregationBuilder termAggBuilder3 = AggregationBuilders.terms("term_agg3").field("field_name3");
termAggBuilder2.subAggregation(termAggBuilder3);
sourceBuilder.aggregation(termAggBuilder1);

7. Prefer Composite aggregation for multi‑dimensional group‑by

CompositeAggregationBuilder compositeAggregationBuilder = AggregationBuilders
    .composite("group_by_A_B_C")
    .sources(
        AggregationBuilders.terms("group_by_A").field("fieldA.keyword"),
        AggregationBuilders.terms("group_by_B").field("fieldB.keyword"),
        AggregationBuilders.terms("group_by_C").field("fieldC.keyword")
    );
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder()
    .query(QueryBuilders.matchAllQuery())
    .aggregation(compositeAggregationBuilder)
    .size(0);

8. Avoid large aggregations

Aggregations keep intermediate results in memory; very large result sets can cause OOM.

9. Use BFS for high‑cardinality nested aggregations

Default DFS builds the full tree before pruning. BFS processes the first level, prunes, then proceeds, reducing memory pressure when each bucket contains few docs.

searchSourceBuilder.aggregation(
    AggregationBuilders.terms("brandIds")
        .collectMode(Aggregator.SubAggCollectionMode.BREADTH_FIRST)
        .field("brandId")
        .size(2000)
        .order(BucketOrder.key(true))
);

10. Do not aggregate on text fields

Use keyword instead to avoid heavy fielddata memory usage.

11. Avoid bucket_sort for deep pagination

It caches all docs and buckets, leading to O(N log N) complexity and high memory consumption. Use Composite aggregation with search_after instead.

// Composite aggregation for deep pagination
CompositeAggregationBuilder compositeBuilder = AggregationBuilders
    .composite("spuIdAgg", Collections.singletonList(
        new TermsValuesSourceBuilder("spuId").field("spuId").order("desc")))
    .aggregateAfter(ImmutableMap.of("spuId", "603030"))
    .size(20);
searchSourceBuilder.query(boolQuery).aggregation(compositeBuilder).size(0);

12. Prefer search_after over scroll for real‑time deep paging

Scroll keeps a snapshot of the index segment, consuming heap memory; search_after with point‑in‑time (PIT) is more efficient and recommended for pages beyond 10 000.

13. Sort by _doc when no custom ordering is needed

Sorting by document ID avoids per‑shard state tracking and reduces coordination overhead.

14. Avoid wildcard‑in‑the‑middle queries

Wildcard patterns are compiled into DFAs that can be expensive. Use the dedicated wildcard field type introduced in ES 7.9 for faster infix matching.

15. Avoid inline scripts; store scripts instead

// Store script
POST _script/activity_discount_price
{
  "script": {"lang": "painless", "source": "doc.xxx.value * params.discount"}
}
// Use stored script
GET index/_search
{
  "script_fields": {"discount_price": {"script": {"id": "activity_discount_price", "params": {"discount": 0.8}}}}

16. Disable _all field

It concatenates all fields into a large string, increasing CPU and storage usage. Keep it disabled (default false).

17. Prefer GET / MGET over search when fetching by ID

Direct ID lookup reads from the forward index and is faster.

18. Limit multi‑index queries

Querying index* can enumerate all indices; set action.destructive_requires_name to block such patterns.

19. Avoid large from+size pagination

Deep paging triggers a full re‑search each time, increasing CPU, memory, and risk of OOM.

20. Use _source_includes / _source_excludes to limit returned fields

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchAllQuery());
String[] includes = {"field1", "field2"};
sourceBuilder.fetchSource(includes, Strings.EMPTY_ARRAY);
String[] excludes = {"field3"};
sourceBuilder.fetchSource(Strings.EMPTY_ARRAY, excludes);

Write‑related Recommendations

21. Avoid manual refresh calls; configure refresh_interval instead

Let Elasticsearch handle refreshes automatically.

22. Keep documents under the default http.max_content_length (100 MB)

Oversized docs are rejected.

23. Let ES generate document IDs

Explicit IDs add an extra existence check per shard, slowing writes.

24. Use Bulk API for large writes

Typical bulk size: 5–15 MB.

Timeout > 60 s.

Distribute load across nodes.

25. Increase refresh_interval and avoid setting replica count to 0 during massive writes

After the load finishes, restore original settings.

Index‑creation Recommendations

26. Set replica count ≥ 1 (usually 1–2 per primary)

Provides high availability and modest search performance boost.

27. Keep primary shard count reasonable

Do not exceed three times the number of nodes; aim for 5–8 GB per shard (≈ 30–50 GB data). Total index size < 1 TB, docs < 1 billion per shard.

28. Disable dynamic mapping; define explicit field types, analyzers, and sub‑types as needed

29. Use keyword instead of text for non‑analyzed strings

30. Limit total field count (default 1000) to ≤ 100

31. Set index to false for fields that are not searched

{
  "mappings": {
    "properties": {
      "title": {"type": "text", "index": false},
      "content": {"type": "text"}
    }
  }
}

32. Avoid nested and parent/child relationships when possible

They increase document count and write cost; prefer flat documents or separate indices.

33. Disable norms for fields that are not scored

{"title": {"type": "string", "norms": {"enabled": false}}}

34. Disable doc_values for fields not used in sorting, aggregations, or scripts

35. Choose field type wisely: use numeric for range queries, keyword for low‑cardinality terms, and enable eager_global_ordinals for high‑cardinality keyword fields that are heavily aggregated

PUT index
{
  "mappings": {
    "type": {
      "properties": {
        "foo": {"type": "keyword", "eager_global_ordinals": true}
      }
    }
  }
}

Conclusion

Over the past decade Elasticsearch has become the most popular open‑source search engine. This article consolidates a comprehensive set of best practices and anti‑patterns for daily development, aiming to help practitioners write faster, more reliable queries and maintain healthy clusters.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization index design Search Query Tuning

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Introduction

Query‑related Recommendations

1. Leverage Shard Request Cache

2. Use Node Query/Filter Cache

3. Prefer Filter Context over Query Context

4. Set size=0 for aggregation‑only queries

5. Use absolute dates instead of now

6. Avoid deep nested aggregations

7. Prefer Composite aggregation for multi‑dimensional group‑by

8. Avoid large aggregations

9. Use BFS for high‑cardinality nested aggregations

10. Do not aggregate on text fields

11. Avoid bucket_sort for deep pagination

12. Prefer search_after over scroll for real‑time deep paging

13. Sort by _doc when no custom ordering is needed

14. Avoid wildcard‑in‑the‑middle queries

15. Avoid inline scripts; store scripts instead

16. Disable _all field

17. Prefer GET / MGET over search when fetching by ID

18. Limit multi‑index queries

19. Avoid large from+size pagination

20. Use _source_includes / _source_excludes to limit returned fields

Write‑related Recommendations

21. Avoid manual refresh calls; configure refresh_interval instead

22. Keep documents under the default http.max_content_length (100 MB)

23. Let ES generate document IDs

24. Use Bulk API for large writes

25. Increase refresh_interval and avoid setting replica count to 0 during massive writes

Index‑creation Recommendations

26. Set replica count ≥ 1 (usually 1–2 per primary)

27. Keep primary shard count reasonable

28. Disable dynamic mapping; define explicit field types, analyzers, and sub‑types as needed

29. Use keyword instead of text for non‑analyzed strings

30. Limit total field count (default 1000) to ≤ 100

31. Set index to false for fields that are not searched

32. Avoid nested and parent/child relationships when possible

33. Disable norms for fields that are not scored

34. Disable doc_values for fields not used in sorting, aggregations, or scripts

35. Choose field type wisely: use numeric for range queries, keyword for low‑cardinality terms, and enable eager_global_ordinals for high‑cardinality keyword fields that are heavily aggregated

Conclusion

dbaplus Community

How this landed with the community

Was this worth your time?

0 Comments

22. Keep documents under the default http.max_content_length (100 MB)

25. Increase refresh_interval and avoid setting replica count to 0 during massive writes

26. Set replica count ≥ 1 (usually 1–2 per primary)

30. Limit total field count (default 1000) to ≤ 100