Backend Development 25 min read

30 Essential Elasticsearch Tips to Boost Query Performance and Avoid Common Pitfalls

This article compiles practical Elasticsearch recommendations covering query caching, filter contexts, pagination, aggregation strategies, index mapping, shard design, and scripting best practices, providing developers with actionable insights to improve search performance, reduce resource consumption, and prevent common operational issues.

Sanyou's Java Diary

Jan 11, 2024

30 Essential Elasticsearch Tips to Boost Query Performance and Avoid Common Pitfalls

Preface

This article shares practical Elasticsearch usage suggestions, explaining the rationale behind each recommendation rather than merely presenting conclusions.

Query‑related Tips

1. Fully Utilize Caches

Shard Request Cache : Cached per‑shard query results (aggregations, hits.total, suggestions) when the request has size=0. It is invalidated after a segment refresh.

Node Query/Filter Cache : Implemented in Lucene (LRUQueryCache). Only filters on segments larger than 10,000 docs and >3% of the shard are cached.

2. Use Filter Context Instead of Query Context

Filters are not scored and can be cached, improving performance.

BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
boolQuery.filter(QueryBuilders.termQuery("field", "value"));

3. Set size=0 When Only Aggregations Are Needed

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.aggregation(AggregationBuilders.terms("term_agg").field("field"));
sourceBuilder.size(0);

4. Use Absolute Time Values for Date Range Queries

Avoid now in range queries because it prevents caching.

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
LocalDateTime now = LocalDateTime.now();
String currentDate = now.format(DateTimeFormatter.ISO_DATE);
sourceBuilder.query(QueryBuilders.rangeQuery("date_field").gte("2022-01-01").lte(currentDate));

5. Avoid Deeply Nested Aggregations

Each nested aggregation creates new buckets, which can exhaust memory.

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchAllQuery());
TermsAggregationBuilder termAgg1 = AggregationBuilders.terms("term_agg1").field("field1");
TermsAggregationBuilder termAgg2 = AggregationBuilders.terms("term_agg2").field("field2");
termAgg1.subAggregation(termAgg2);
sourceBuilder.aggregation(termAgg1);

6. Prefer Composite Aggregation for Multi‑Dimensional Group‑by

CompositeAggregationBuilder compositeAgg = AggregationBuilders.composite("group_by_A_B_C")
    .sources(
        AggregationBuilders.terms("group_by_A").field("fieldA.keyword"),
        AggregationBuilders.terms("group_by_B").field("fieldB.keyword"),
        AggregationBuilders.terms("group_by_C").field("fieldC.keyword")
    );
SearchSourceBuilder sb = new SearchSourceBuilder()
    .query(QueryBuilders.matchAllQuery())
    .aggregation(compositeAgg)
    .size(0);

7. Avoid Large Aggregations and High‑Cardinality Buckets

Large intermediate results consume excessive heap memory.

8. Use BFS Collection Mode for High‑Cardinality Aggregations

searchSourceBuilder.aggregation(
    AggregationBuilders.terms("brandIds")
        .collectMode(Aggregator.SubAggCollectionMode.BREADTH_FIRST)
        .field("brandId")
        .size(2000)
        .order(BucketOrder.key(true))
);

9. Do Not Aggregate on text Fields

Enable fielddata only when necessary; otherwise use keyword.

10. Avoid Deep Pagination with from+size

Deep pagination triggers a full re‑search each time, leading to high CPU and memory usage.

11. Prefer SearchAfter (or PIT) Over Scroll for Real‑Time Large Result Sets

Scroll holds a snapshot context and can exhaust memory; SearchAfter is more efficient for deep pagination.

12. Ensure Sort Fields Are Unique When Using SearchAfter

Non‑unique sort fields may cause missing or duplicate results.

13. Sort by Business Fields Instead of Default _score

Using _doc sort avoids scoring overhead.

Document‑ID sorting is specially optimized in ES.

Write‑related Tips

Avoid manual Refresh calls; configure refresh_interval instead.

Do not index overly large documents (default limit 100 MB).

Let ES generate document IDs to avoid extra existence checks.

Use the Bulk API for large writes, tuning batch size (≈5‑15 MB) and timeout (>60 s).

When bulk‑loading, increase refresh_interval and avoid setting replica count to 0.

Index Creation

Shard Design

Keep replica count between 1‑2 per primary shard for high availability.

Limit primary shard size to 30‑50 GB and total index size to ~1 TB.

Mapping Design

Disable dynamic mapping; explicitly define field types, analyzers, and index settings.

Use keyword for non‑analyzed strings; reserve text for full‑text search.

Keep total field count below 100 to maintain indexing speed.

Set index=false for fields that do not need to be searchable.

Keyword vs Numeric Selection

Choose keyword for low‑cardinality exact matches; use numeric when range queries are required.

For rarely used range fields, store them as keyword to benefit term‑query performance.

Eager Global Ordinals

Enable eager_global_ordinals on high‑cardinality keyword fields to pre‑build global ordinals at refresh time, trading write speed for faster aggregations.

PUT index
{
  "mappings": {
    "properties": {
      "foo": {
        "type": "keyword",
        "eager_global_ordinals": true
      }
    }
  }
}

Summary

Over the past decade Elasticsearch has become the most popular open‑source search engine. This guide consolidates essential development practices and common pitfalls, offering developers concrete techniques to optimize queries, mappings, sharding, and bulk operations for reliable and high‑performance search services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

search engine ElasticSearch query-optimization

Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.