Backend Development 12 min read

10 Powerful Elasticsearch DSL Tricks to Solve Real‑World Performance Pain Points

This article presents ten practical Elasticsearch performance‑tuning techniques—including query DSL, deep pagination, mapping design, high‑cardinality aggregations, nested queries, script optimization, index templates, force‑merge, bulk writes, and profiling—each illustrated with concrete scenarios, code snippets, and step‑by‑step analysis to boost cluster speed and stability.

Mingyi World Elasticsearch

Feb 11, 2025

10 Powerful Elasticsearch DSL Tricks to Solve Real‑World Performance Pain Points

1. Query DSL Optimization: Precise Use of Filter and Query

Problem scenario: Large bool queries mixing filter and scoring logic cause poor performance.

Optimization principle: Use filter context for non‑scoring clauses (leveraging cache) and must/should only for scoring parts.

Example:

GET /logs/_search
{
  "query": {
    "bool": {
      "filter": [
        {"term": {"status": "error"}}, // precise filter, cacheable
        {"range": {"@timestamp": {"gte": "now-1d/d"}}}
      ],
      "must": [
        {"match": {"message": "timeout"}} // scoring part
      ]
    }
  }
}

2. Deep Pagination Trap and Search‑After Solution

Problem scenario: Using from+size beyond 10,000 leads to steep performance drop because each shard must collect and sort all preceding documents.

Optimization solution: Replace deep pagination with search_after, using the sort values of the last hit from the previous page.

Example:

// First page
GET /orders/_search
{
  "size": 100,
  "sort": [
    {"order_id": "asc"},
    "_doc" // ensure unique ordering
  ]
}

// Subsequent page using last hit's sort values
GET /orders/_search
{
  "size": 100,
  "search_after": ["12345", "65429"]
}

3. Index Mapping Design: Disable Dynamic Field Explosion

Problem scenario: Log data with unrestricted dynamic mapping creates tens of thousands of fields.

Defensive solution: Set dynamic": "strict" to forbid automatic field addition.

Example:

PUT /logstash-2023
{
  "mappings": {
    "dynamic": "strict", // prohibit auto‑added fields
    "properties": {
      "@timestamp": {"type": "date"},
      "message":   {"type": "text"},
      "level":     {"type": "keyword"}
    }
  }
}

4. Aggregation Performance: Choose Execution Hint

Problem scenario: High‑cardinality terms aggregation causes memory overflow.

Optimization strategy: Use execution_hint": "map" for high‑cardinality fields; default global_ordinals works for low‑cardinality fields.

Example:

GET /sales/_search
{
  "aggs": {
    "products": {
      "terms": {
        "field": "product_id",
        "size": 100,
        "execution_hint": "map" // suitable for high cardinality
      }
    }
  }
}

map

: directly map field values, avoiding global ordinals. global_ordinals: default, good for medium/low cardinality.

5. Nested Object Query Pitfalls

Problem scenario: Nested document queries suffer from poor performance.

Optimization steps:

Avoid overusing nested type.

When querying, specify inner_hits and limit its size to reduce returned nested docs.

Example:

GET /products/_search
{
  "query": {
    "nested": {
      "path": "reviews",
      "query": {"term": {"reviews.rating": 5}},
      "inner_hits": {"size": 2} // limit nested docs returned
    }
  }
}

6. Script Query Extreme Optimization

Problem scenario: Dynamic scripts cause CPU spikes.

Optimization tricks:

Prefer pre‑processing with an ingest pipeline to compute fields at index time.

If pre‑processing is impossible, replace scripts with runtime_fields.

Use script_score only when necessary and ensure parameters are passed to avoid recompilation.

Solution 1: Ingest Pre‑processing

PUT _ingest/pipeline/preprocess_price
{
  "processors": [
    {"script": {"source": "ctx.adjusted_price = ctx.price * params.factor;", "params": {"factor": 1.2}}}
  ]
}

PUT /inventory/_doc/1?pipeline=preprocess_price
{
  "price": 100
}

Idea: compute adjusted_price during write, then filter on it without running a script at query time.

Solution 2: Runtime Fields Instead of Script

GET /inventory/_search
{
  "runtime_mappings": {
    "adjusted_price": {
      "type": "double",
      "script": {"source": "emit(doc['price'].value * params.factor);", "params": {"factor": 1.2}}
    }
  },
  "query": {"range": {"adjusted_price": {"gte": 120}}}
}

Idea: compute the field on the fly during search, suitable when the value cannot be pre‑computed.

Solution 3: Optimized script_score Query

GET /inventory/_search
{
  "query": {
    "script_score": {
      "query": {"match_all": {}},
      "script": {"source": "Math.log(doc['price'].value * params.factor);", "params": {"factor": 1.2}}
    }
  }
}

Use only when neither ingest nor runtime fields are applicable, and keep the script parameterized.

Final recommendation: Prefer ingest pre‑processing → runtime fields → script_score as the last resort to reduce CPU load and improve query performance.

7. Index Templates and Lifecycle Automation

Scenario: Daily log indices roll over, leading to chaotic retention policies.

Solution: Use Index Lifecycle Management (ILM) with dynamic index templates to manage indices based on date.

Example:

PUT _index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs_policy"
    },
    "mappings": { ... }
  }
}

PUT _ilm/policy/logs_policy
{
  "phases": {
    "hot": {"actions": {"rollover": {"max_size": "50GB"}}},
    "delete": {"min_age": "30d", "actions": {"delete": {}}}
  }
}

8. Cautious Use of Force Merge

Scenario: Historical indices contain many small segments, hurting query speed.

Operation advice:

Run after the index is no longer receiving writes.

Avoid executing during peak traffic.

Example:

POST /logs-2025-01-01/_forcemerge
{
  "max_num_segments": 1 // merge into a single segment
}

9. Bulk Write Performance Tuning

Golden rule:

Keep batch size between 5‑15 MB and adjust according to load.

Use multithreaded sending; throttle client to avoid node overload.

Set refresh_interval": "-1" during bulk load, then restore.

Reference articles discuss the impact of refresh on latency.

Example:

PUT /logs/_settings
{
  "index": {"refresh_interval": "-1"} // disable real‑time refresh
}

// after bulk write
PUT /logs/_settings
{
  "index": {"refresh_interval": "1s"}
}

10. Profile API for Root‑Cause Diagnosis of Slow Queries

Diagnostic step: Enable profiling in the search request.

GET /products/_search
{
  "profile": true,
  "query": {"wildcard": {"title": "elastic*"}}
}

Analysis output:

Inspect QueryTime and Breakdown details.

Pay attention to WildcardQuery latency; consider replacing with N‑gram analysis.

Final Recommendations: Monitoring and Diagnosis Toolchain

GET _nodes/hot_threads

– locate hot threads. GET _cat/indices?v&s=store.size:desc – view index sizes. GET _cluster/allocation/explain – explain shard allocation issues.

Integrate Elastic Monitoring (Metricbeat + Kibana) and track JVM heap pressure, GC time, segment memory, pending tasks.

By combining the above DSL‑level optimizations, practitioners can significantly improve cluster stability and query performance, and should establish continuous performance baselines tailored to their workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization dsl Indexing Elasticsearch Profiling Query Tuning Bulk Write

Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.