Boost Elasticsearch Performance: 24 Proven Tips and Pitfalls to Avoid

This article compiles practical Elasticsearch recommendations—covering query caching, aggregation design, pagination strategies, scripting, mapping choices, shard and replica settings, and bulk indexing—to help engineers write faster, more reliable search queries while avoiding common performance traps.

dbaplus Community
dbaplus Community
dbaplus Community
Boost Elasticsearch Performance: 24 Proven Tips and Pitfalls to Avoid

Introduction

The author shares practical Elasticsearch advice, explaining the rationale behind each recommendation rather than merely listing conclusions, and invites feedback on any inaccuracies.

Query‑related Recommendations

1. Leverage Shard Request Cache

Implemented in IndicesRequestCache, it caches the entire client request per shard. It is effective mainly for aggregations, hits.total, and suggestions. The cache is only used when the request includes size=0. Requests with scroll, profiling, non‑ QUERY_THEN_FETCH types, or requestCache=false are excluded. Cache is invalidated after a segment refresh.

2. Use Node Query/Filter Cache

Implemented in LRUQueryCache (default enabled). It caches filter sub‑queries per segment. Only filters on segments larger than 10 000 docs and occupying >3 % of the shard are cached. Cache is cleared when a segment is merged.

3. Prefer Filter Context over Query Context

Filters do not score, while must does.

Filters are cacheable, improving performance.

// Create BoolQueryBuilder
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
// Use filter context
boolQuery.filter(QueryBuilders.termQuery("field", "value"));

4. Set size=0 for aggregation‑only queries

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.aggregation(AggregationBuilders.terms("term_agg").field("field")
    .subAggregation(AggregationBuilders.sum("sum_agg").field("field")));
sourceBuilder.size(0);

5. Use absolute dates instead of now

LocalDateTime now = LocalDateTime.now();
String currentDate = now.format(DateTimeFormatter.ISO_DATE);
sourceBuilder.query(QueryBuilders.rangeQuery("date_field").gte("2022-01-01").lte(currentDate));

6. Avoid deep nested aggregations

Each intermediate and final aggregation result lives in memory; excessive nesting can exhaust memory.

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchAllQuery());
TermsAggregationBuilder termAggBuilder1 = AggregationBuilders.terms("term_agg1").field("field_name1");
TermsAggregationBuilder termAggBuilder2 = AggregationBuilders.terms("term_agg2").field("field_name2");
termAggBuilder1.subAggregation(termAggBuilder2);
TermsAggregationBuilder termAggBuilder3 = AggregationBuilders.terms("term_agg3").field("field_name3");
termAggBuilder2.subAggregation(termAggBuilder3);
sourceBuilder.aggregation(termAggBuilder1);

7. Prefer Composite aggregation for multi‑dimensional group‑by

CompositeAggregationBuilder compositeAggregationBuilder = AggregationBuilders
    .composite("group_by_A_B_C")
    .sources(
        AggregationBuilders.terms("group_by_A").field("fieldA.keyword"),
        AggregationBuilders.terms("group_by_B").field("fieldB.keyword"),
        AggregationBuilders.terms("group_by_C").field("fieldC.keyword")
    );
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder()
    .query(QueryBuilders.matchAllQuery())
    .aggregation(compositeAggregationBuilder)
    .size(0);

8. Avoid large aggregations

Aggregations keep intermediate results in memory; very large result sets can cause OOM.

9. Use BFS for high‑cardinality nested aggregations

Default DFS builds the full tree before pruning. BFS processes the first level, prunes, then proceeds, reducing memory pressure when each bucket contains few docs.

searchSourceBuilder.aggregation(
    AggregationBuilders.terms("brandIds")
        .collectMode(Aggregator.SubAggCollectionMode.BREADTH_FIRST)
        .field("brandId")
        .size(2000)
        .order(BucketOrder.key(true))
);

10. Do not aggregate on text fields

Use keyword instead to avoid heavy fielddata memory usage.

11. Avoid bucket_sort for deep pagination

It caches all docs and buckets, leading to O(N log N) complexity and high memory consumption. Use Composite aggregation with search_after instead.

// Composite aggregation for deep pagination
CompositeAggregationBuilder compositeBuilder = AggregationBuilders
    .composite("spuIdAgg", Collections.singletonList(
        new TermsValuesSourceBuilder("spuId").field("spuId").order("desc")))
    .aggregateAfter(ImmutableMap.of("spuId", "603030"))
    .size(20);
searchSourceBuilder.query(boolQuery).aggregation(compositeBuilder).size(0);

12. Prefer search_after over scroll for real‑time deep paging

Scroll keeps a snapshot of the index segment, consuming heap memory; search_after with point‑in‑time (PIT) is more efficient and recommended for pages beyond 10 000.

13. Sort by _doc when no custom ordering is needed

Sorting by document ID avoids per‑shard state tracking and reduces coordination overhead.

14. Avoid wildcard‑in‑the‑middle queries

Wildcard patterns are compiled into DFAs that can be expensive. Use the dedicated wildcard field type introduced in ES 7.9 for faster infix matching.

15. Avoid inline scripts; store scripts instead

// Store script
POST _script/activity_discount_price
{
  "script": {"lang": "painless", "source": "doc.xxx.value * params.discount"}
}
// Use stored script
GET index/_search
{
  "script_fields": {"discount_price": {"script": {"id": "activity_discount_price", "params": {"discount": 0.8}}}}

16. Disable _all field

It concatenates all fields into a large string, increasing CPU and storage usage. Keep it disabled (default false).

17. Prefer GET / MGET over search when fetching by ID

Direct ID lookup reads from the forward index and is faster.

18. Limit multi‑index queries

Querying index* can enumerate all indices; set action.destructive_requires_name to block such patterns.

19. Avoid large from+size pagination

Deep paging triggers a full re‑search each time, increasing CPU, memory, and risk of OOM.

20. Use _source_includes / _source_excludes to limit returned fields

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchAllQuery());
String[] includes = {"field1", "field2"};
sourceBuilder.fetchSource(includes, Strings.EMPTY_ARRAY);
String[] excludes = {"field3"};
sourceBuilder.fetchSource(Strings.EMPTY_ARRAY, excludes);

Write‑related Recommendations

21. Avoid manual refresh calls; configure refresh_interval instead

Let Elasticsearch handle refreshes automatically.

22. Keep documents under the default http.max_content_length (100 MB)

Oversized docs are rejected.

23. Let ES generate document IDs

Explicit IDs add an extra existence check per shard, slowing writes.

24. Use Bulk API for large writes

Typical bulk size: 5–15 MB.

Timeout > 60 s.

Distribute load across nodes.

25. Increase refresh_interval and avoid setting replica count to 0 during massive writes

After the load finishes, restore original settings.

Index‑creation Recommendations

26. Set replica count ≥ 1 (usually 1–2 per primary)

Provides high availability and modest search performance boost.

27. Keep primary shard count reasonable

Do not exceed three times the number of nodes; aim for 5–8 GB per shard (≈ 30–50 GB data). Total index size < 1 TB, docs < 1 billion per shard.

28. Disable dynamic mapping; define explicit field types, analyzers, and sub‑types as needed

29. Use keyword instead of text for non‑analyzed strings

30. Limit total field count (default 1000) to ≤ 100

31. Set index to false for fields that are not searched

{
  "mappings": {
    "properties": {
      "title": {"type": "text", "index": false},
      "content": {"type": "text"}
    }
  }
}

32. Avoid nested and parent/child relationships when possible

They increase document count and write cost; prefer flat documents or separate indices.

33. Disable norms for fields that are not scored

{"title": {"type": "string", "norms": {"enabled": false}}}

34. Disable doc_values for fields not used in sorting, aggregations, or scripts

35. Choose field type wisely: use numeric for range queries, keyword for low‑cardinality terms, and enable eager_global_ordinals for high‑cardinality keyword fields that are heavily aggregated

PUT index
{
  "mappings": {
    "type": {
      "properties": {
        "foo": {"type": "keyword", "eager_global_ordinals": true}
      }
    }
  }
}

Conclusion

Over the past decade Elasticsearch has become the most popular open‑source search engine. This article consolidates a comprehensive set of best practices and anti‑patterns for daily development, aiming to help practitioners write faster, more reliable queries and maintain healthy clusters.

Elasticsearch diagram
Elasticsearch diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance Optimizationindex designSearchQuery Tuning
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.