Boost Elasticsearch Performance: 24 Proven Tips and Pitfalls to Avoid
This article compiles practical Elasticsearch recommendations—covering query caching, aggregation design, pagination strategies, scripting, mapping choices, shard and replica settings, and bulk indexing—to help engineers write faster, more reliable search queries while avoiding common performance traps.
Introduction
The author shares practical Elasticsearch advice, explaining the rationale behind each recommendation rather than merely listing conclusions, and invites feedback on any inaccuracies.
Query‑related Recommendations
1. Leverage Shard Request Cache
Implemented in IndicesRequestCache, it caches the entire client request per shard. It is effective mainly for aggregations, hits.total, and suggestions. The cache is only used when the request includes size=0. Requests with scroll, profiling, non‑ QUERY_THEN_FETCH types, or requestCache=false are excluded. Cache is invalidated after a segment refresh.
2. Use Node Query/Filter Cache
Implemented in LRUQueryCache (default enabled). It caches filter sub‑queries per segment. Only filters on segments larger than 10 000 docs and occupying >3 % of the shard are cached. Cache is cleared when a segment is merged.
3. Prefer Filter Context over Query Context
Filters do not score, while must does.
Filters are cacheable, improving performance.
// Create BoolQueryBuilder
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
// Use filter context
boolQuery.filter(QueryBuilders.termQuery("field", "value"));4. Set size=0 for aggregation‑only queries
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.aggregation(AggregationBuilders.terms("term_agg").field("field")
.subAggregation(AggregationBuilders.sum("sum_agg").field("field")));
sourceBuilder.size(0);5. Use absolute dates instead of now
LocalDateTime now = LocalDateTime.now();
String currentDate = now.format(DateTimeFormatter.ISO_DATE);
sourceBuilder.query(QueryBuilders.rangeQuery("date_field").gte("2022-01-01").lte(currentDate));6. Avoid deep nested aggregations
Each intermediate and final aggregation result lives in memory; excessive nesting can exhaust memory.
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchAllQuery());
TermsAggregationBuilder termAggBuilder1 = AggregationBuilders.terms("term_agg1").field("field_name1");
TermsAggregationBuilder termAggBuilder2 = AggregationBuilders.terms("term_agg2").field("field_name2");
termAggBuilder1.subAggregation(termAggBuilder2);
TermsAggregationBuilder termAggBuilder3 = AggregationBuilders.terms("term_agg3").field("field_name3");
termAggBuilder2.subAggregation(termAggBuilder3);
sourceBuilder.aggregation(termAggBuilder1);7. Prefer Composite aggregation for multi‑dimensional group‑by
CompositeAggregationBuilder compositeAggregationBuilder = AggregationBuilders
.composite("group_by_A_B_C")
.sources(
AggregationBuilders.terms("group_by_A").field("fieldA.keyword"),
AggregationBuilders.terms("group_by_B").field("fieldB.keyword"),
AggregationBuilders.terms("group_by_C").field("fieldC.keyword")
);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder()
.query(QueryBuilders.matchAllQuery())
.aggregation(compositeAggregationBuilder)
.size(0);8. Avoid large aggregations
Aggregations keep intermediate results in memory; very large result sets can cause OOM.
9. Use BFS for high‑cardinality nested aggregations
Default DFS builds the full tree before pruning. BFS processes the first level, prunes, then proceeds, reducing memory pressure when each bucket contains few docs.
searchSourceBuilder.aggregation(
AggregationBuilders.terms("brandIds")
.collectMode(Aggregator.SubAggCollectionMode.BREADTH_FIRST)
.field("brandId")
.size(2000)
.order(BucketOrder.key(true))
);10. Do not aggregate on text fields
Use keyword instead to avoid heavy fielddata memory usage.
11. Avoid bucket_sort for deep pagination
It caches all docs and buckets, leading to O(N log N) complexity and high memory consumption. Use Composite aggregation with search_after instead.
// Composite aggregation for deep pagination
CompositeAggregationBuilder compositeBuilder = AggregationBuilders
.composite("spuIdAgg", Collections.singletonList(
new TermsValuesSourceBuilder("spuId").field("spuId").order("desc")))
.aggregateAfter(ImmutableMap.of("spuId", "603030"))
.size(20);
searchSourceBuilder.query(boolQuery).aggregation(compositeBuilder).size(0);12. Prefer search_after over scroll for real‑time deep paging
Scroll keeps a snapshot of the index segment, consuming heap memory; search_after with point‑in‑time (PIT) is more efficient and recommended for pages beyond 10 000.
13. Sort by _doc when no custom ordering is needed
Sorting by document ID avoids per‑shard state tracking and reduces coordination overhead.
14. Avoid wildcard‑in‑the‑middle queries
Wildcard patterns are compiled into DFAs that can be expensive. Use the dedicated wildcard field type introduced in ES 7.9 for faster infix matching.
15. Avoid inline scripts; store scripts instead
// Store script
POST _script/activity_discount_price
{
"script": {"lang": "painless", "source": "doc.xxx.value * params.discount"}
}
// Use stored script
GET index/_search
{
"script_fields": {"discount_price": {"script": {"id": "activity_discount_price", "params": {"discount": 0.8}}}}16. Disable _all field
It concatenates all fields into a large string, increasing CPU and storage usage. Keep it disabled (default false).
17. Prefer GET / MGET over search when fetching by ID
Direct ID lookup reads from the forward index and is faster.
18. Limit multi‑index queries
Querying index* can enumerate all indices; set action.destructive_requires_name to block such patterns.
19. Avoid large from+size pagination
Deep paging triggers a full re‑search each time, increasing CPU, memory, and risk of OOM.
20. Use _source_includes / _source_excludes to limit returned fields
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.matchAllQuery());
String[] includes = {"field1", "field2"};
sourceBuilder.fetchSource(includes, Strings.EMPTY_ARRAY);
String[] excludes = {"field3"};
sourceBuilder.fetchSource(Strings.EMPTY_ARRAY, excludes);Write‑related Recommendations
21. Avoid manual refresh calls; configure refresh_interval instead
Let Elasticsearch handle refreshes automatically.
22. Keep documents under the default http.max_content_length (100 MB)
Oversized docs are rejected.
23. Let ES generate document IDs
Explicit IDs add an extra existence check per shard, slowing writes.
24. Use Bulk API for large writes
Typical bulk size: 5–15 MB.
Timeout > 60 s.
Distribute load across nodes.
25. Increase refresh_interval and avoid setting replica count to 0 during massive writes
After the load finishes, restore original settings.
Index‑creation Recommendations
26. Set replica count ≥ 1 (usually 1–2 per primary)
Provides high availability and modest search performance boost.
27. Keep primary shard count reasonable
Do not exceed three times the number of nodes; aim for 5–8 GB per shard (≈ 30–50 GB data). Total index size < 1 TB, docs < 1 billion per shard.
28. Disable dynamic mapping; define explicit field types, analyzers, and sub‑types as needed
29. Use keyword instead of text for non‑analyzed strings
30. Limit total field count (default 1000) to ≤ 100
31. Set index to false for fields that are not searched
{
"mappings": {
"properties": {
"title": {"type": "text", "index": false},
"content": {"type": "text"}
}
}
}32. Avoid nested and parent/child relationships when possible
They increase document count and write cost; prefer flat documents or separate indices.
33. Disable norms for fields that are not scored
{"title": {"type": "string", "norms": {"enabled": false}}}34. Disable doc_values for fields not used in sorting, aggregations, or scripts
35. Choose field type wisely: use numeric for range queries, keyword for low‑cardinality terms, and enable eager_global_ordinals for high‑cardinality keyword fields that are heavily aggregated
PUT index
{
"mappings": {
"type": {
"properties": {
"foo": {"type": "keyword", "eager_global_ordinals": true}
}
}
}
}Conclusion
Over the past decade Elasticsearch has become the most popular open‑source search engine. This article consolidates a comprehensive set of best practices and anti‑patterns for daily development, aiming to help practitioners write faster, more reliable queries and maintain healthy clusters.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
