30 Essential Elasticsearch Tips to Boost Query Performance and Avoid Common Pitfalls
This article compiles practical Elasticsearch recommendations covering query caching, filter contexts, pagination, aggregation strategies, index mapping, shard design, and scripting best practices, providing developers with actionable insights to improve search performance, reduce resource consumption, and prevent common operational issues.
Preface
This article shares practical Elasticsearch usage suggestions, explaining the rationale behind each recommendation rather than merely presenting conclusions.
Query‑related Tips
1. Fully Utilize Caches
Shard Request Cache : Cached per‑shard query results (aggregations, hits.total, suggestions) when the request has size=0. It is invalidated after a segment refresh.
Node Query/Filter Cache : Implemented in Lucene (LRUQueryCache). Only filters on segments larger than 10,000 docs and >3% of the shard are cached.
2. Use Filter Context Instead of Query Context
Filters are not scored and can be cached, improving performance.
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
boolQuery.filter(QueryBuilders.termQuery("field", "value"));3. Set size=0 When Only Aggregations Are Needed
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.aggregation(AggregationBuilders.terms("term_agg").field("field"));
sourceBuilder.size(0);4. Use Absolute Time Values for Date Range Queries
Avoid now in range queries because it prevents caching.
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
LocalDateTime now = LocalDateTime.now();
String currentDate = now.format(DateTimeFormatter.ISO_DATE);
sourceBuilder.query(QueryBuilders.rangeQuery("date_field").gte("2022-01-01").lte(currentDate));5. Avoid Deeply Nested Aggregations
Each nested aggregation creates new buckets, which can exhaust memory.
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchAllQuery());
TermsAggregationBuilder termAgg1 = AggregationBuilders.terms("term_agg1").field("field1");
TermsAggregationBuilder termAgg2 = AggregationBuilders.terms("term_agg2").field("field2");
termAgg1.subAggregation(termAgg2);
sourceBuilder.aggregation(termAgg1);6. Prefer Composite Aggregation for Multi‑Dimensional Group‑by
CompositeAggregationBuilder compositeAgg = AggregationBuilders.composite("group_by_A_B_C")
.sources(
AggregationBuilders.terms("group_by_A").field("fieldA.keyword"),
AggregationBuilders.terms("group_by_B").field("fieldB.keyword"),
AggregationBuilders.terms("group_by_C").field("fieldC.keyword")
);
SearchSourceBuilder sb = new SearchSourceBuilder()
.query(QueryBuilders.matchAllQuery())
.aggregation(compositeAgg)
.size(0);7. Avoid Large Aggregations and High‑Cardinality Buckets
Large intermediate results consume excessive heap memory.
8. Use BFS Collection Mode for High‑Cardinality Aggregations
searchSourceBuilder.aggregation(
AggregationBuilders.terms("brandIds")
.collectMode(Aggregator.SubAggCollectionMode.BREADTH_FIRST)
.field("brandId")
.size(2000)
.order(BucketOrder.key(true))
);9. Do Not Aggregate on text Fields
Enable fielddata only when necessary; otherwise use keyword.
10. Avoid Deep Pagination with from+size
Deep pagination triggers a full re‑search each time, leading to high CPU and memory usage.
11. Prefer SearchAfter (or PIT) Over Scroll for Real‑Time Large Result Sets
Scroll holds a snapshot context and can exhaust memory; SearchAfter is more efficient for deep pagination.
12. Ensure Sort Fields Are Unique When Using SearchAfter
Non‑unique sort fields may cause missing or duplicate results.
13. Sort by Business Fields Instead of Default _score
Using _doc sort avoids scoring overhead.
Document‑ID sorting is specially optimized in ES.
Write‑related Tips
Avoid manual Refresh calls; configure refresh_interval instead.
Do not index overly large documents (default limit 100 MB).
Let ES generate document IDs to avoid extra existence checks.
Use the Bulk API for large writes, tuning batch size (≈5‑15 MB) and timeout (>60 s).
When bulk‑loading, increase refresh_interval and avoid setting replica count to 0.
Index Creation
Shard Design
Keep replica count between 1‑2 per primary shard for high availability.
Limit primary shard size to 30‑50 GB and total index size to ~1 TB.
Mapping Design
Disable dynamic mapping; explicitly define field types, analyzers, and index settings.
Use keyword for non‑analyzed strings; reserve text for full‑text search.
Keep total field count below 100 to maintain indexing speed.
Set index=false for fields that do not need to be searchable.
Keyword vs Numeric Selection
Choose keyword for low‑cardinality exact matches; use numeric when range queries are required.
For rarely used range fields, store them as keyword to benefit term‑query performance.
Eager Global Ordinals
Enable eager_global_ordinals on high‑cardinality keyword fields to pre‑build global ordinals at refresh time, trading write speed for faster aggregations.
PUT index
{
"mappings": {
"properties": {
"foo": {
"type": "keyword",
"eager_global_ordinals": true
}
}
}
}Summary
Over the past decade Elasticsearch has become the most popular open‑source search engine. This guide consolidates essential development practices and common pitfalls, offering developers concrete techniques to optimize queries, mappings, sharding, and bulk operations for reliable and high‑performance search services.
Sanyou's Java Diary
Passionate about technology, though not great at solving problems; eager to share, never tire of learning!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
