Databases 22 min read

Elasticsearch Best Practices: Query, Index, and Performance Optimizations

The guide outlines production‑ready Elasticsearch best practices, covering query tuning such as using shard request cache, filter context, size‑0 aggregations and composite aggregations; write strategies like auto‑generated IDs, bulk API sizing and refresh handling; optimal shard counts, explicit mappings with disabled unnecessary features, and general advice to use explicit index names and stored scripts.

DeWu Technology

Dec 18, 2023

Elasticsearch Best Practices: Query, Index, and Performance Optimizations

This article shares practical suggestions for using Elasticsearch in production, explaining the rationale behind each recommendation.

Query-related optimizations

1. Leverage shard request cache for aggregations; only cached when size=0, not for scroll, profiling, etc. Cache is invalidated after a segment refresh.

2. Use filter context instead of query context to enable caching and avoid scoring.

BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
boolQuery.filter(QueryBuilders.termQuery("field", "value"));

3. Set size=0 when only aggregation results are needed.

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.aggregation(...);
sourceBuilder.size(0);

4. Prefer absolute time values over now in range queries to allow cache reuse.

5. Avoid deep nested aggregations; use composite aggregation for multi‑dimensional group‑by.

CompositeAggregationBuilder compositeAggregationBuilder = AggregationBuilders.composite(...);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder()
    .query(QueryBuilders.matchAllQuery())
    .aggregation(compositeAggregationBuilder)
    .size(0);

6. Do not use bucket_sort for deep pagination; prefer composite aggregation or PIT + search_after.

7. Use _doc sort for scroll when possible, and always clear scroll contexts.

Write‑related recommendations

• Let Elasticsearch generate document IDs instead of specifying them.

• Use Bulk API for large writes, tuning batch size (5‑15 MB) and refresh interval.

• Avoid manual refreshes and setting replica count to zero during bulk load.

Index creation and mapping design

• Choose appropriate shard count (usually 1‑2 replicas per primary) and keep shard size below 30‑50 GB.

• Define explicit mappings; avoid dynamic mapping, use keyword for non‑analyzed fields, numeric for range queries.

• Disable norms, doc_values, and fielddata for fields that are not used in scoring, sorting or aggregations.

• Consider eager global ordinals for high‑cardinality keyword fields used in aggregations.

PUT index
{
  "mappings": {
    "properties": {
      "foo": {"type": "keyword", "eager_global_ordinals": true}
    }
  }
}

General advice

• Avoid querying all indices with wildcards, use explicit index names.

• Prefer stored scripts over inline scripts.

• Do not use the deprecated _all field.

The article concludes with a summary of the most important practices and references.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Indexing Elasticsearch Caching Query Optimization

Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.