Analyzing and Optimizing Slow Elasticsearch Queries in a Shared Cluster
In a shared Elasticsearch cluster, the team used slow‑log analysis to pinpoint costly queries caused by unnecessary fuzzy matches and integer‑mapped low‑cardinality fields, then optimized them by converting matches to filters and remapping those fields to keyword, re‑indexing, which cut latency from over 100 ms to under 10 ms and eliminated slow‑query alerts.
In a shared Elasticsearch cluster serving many business services, the lack of constraints on query syntax often results in heavy queries that can bring the cluster to a halt. Because manually reviewing each query is impractical, the team resorted to post‑processing techniques such as slow‑log analysis to maintain cluster stability.
Using the slowlog, the author identified queries that consistently exceed 300 ms. An example of a problematic query is shown below:
{
"from": 0,
"size": 200,
"timeout": "60s",
"query": {
"bool": {
"must": [
{ "match": { "source": { "query": "5", "operator": "OR", "prefix_length": 0, "fuzzy_transpositions": true, "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": false, "boost": 1 } } },
{ "terms": { "type": ["21"], "boost": 1 } },
{ "match": { "creator": { "query": "0d754a8af3104e978c95eb955f6331be", "operator": "OR", "prefix_length": 0, "fuzzy_transpositions": "true", "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": false, "boost": 1 } } },
{ "terms": { "status": ["0","3"], "boost": 1 } },
{ "match": { "isDeleted": { "query": "0", "operator": "OR", "prefix_length": 0, "fuzzy_transpositions": "true", "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": false, "boost": 1 } } }
],
"adjust_pure_negative": true,
"boost": 1
}
},
"_source": { "includes": [], "excludes": [] }
}The same query expressed in SQL‑like syntax is:
SELECT guid FROM xxx WHERE source=5 AND type=21 AND creator='0d754a8af3104e978c95eb955f6331be' AND status IN (0,3) AND isDeleted=0;Key issues identified:
Unnecessary fuzzy matching (e.g., fuzzy_transpositions ) that expands the result set.
Fields such as isDeleted are mapped as integer , causing Elasticsearch to rewrite term queries into PointRangeQuery , which is much slower than a simple term lookup.
Multiple term filters are not executed in the order written; Elasticsearch reorders them based on term frequency (selectivity) to improve performance.
Optimization steps:
Replace match queries with filter clauses where scoring is not needed, allowing caching.
Change low‑cardinality fields (e.g., isDeleted ) from integer to keyword so they use inverted‑index term queries instead of range queries.
Re‑index the affected indices after updating the mappings.
Verification:
After converting the integer fields to keyword and rebuilding the indices, profiling in Kibana showed the former PointRangeQuery (≈100 ms) dropping to a simple term query (<0.5 ms). Overall query latency fell from >100 ms to under 10 ms, and the number of slow queries reported by the cluster dropped to zero.
Future work includes enforcing keyword mapping for status‑type fields by default on the search platform, while allowing explicit integer mappings for services that truly need range queries. Additional challenges such as high load during reindexing, uneven node utilization, and unpredictable traffic patterns remain, requiring further operational improvements.
HelloTech
Official Hello technology account, sharing tech insights and developments.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.