How to Diagnose and Fix Elasticsearch Slow Queries: From PointRange to Keyword
This article examines why Elasticsearch slow queries occur in a shared cluster, analyzes a problematic query's structure and data‑type choices, and demonstrates how converting integer fields to keyword mappings and adjusting filter order can reduce latency from over 100 ms to under 10 ms while eliminating slow‑query alerts.
Problem: Slow Queries
The shared search platform receives many business queries without strict syntax constraints, leading to occasional massive queries that overload the cluster; CPU usage can reach 100 % as shown in the monitoring screenshot.
Slow Query Analysis
Using the Elasticsearch slowlog, a query that consistently exceeds 300 ms was captured. The raw JSON request looks like:
{
"from": 0,
"size": 200,
"timeout": "60s",
"query": {
"bool": {
"must": [
{ "match": { "source": { "query": "5", "operator": "OR", "prefix_length": 0, "fuzzy_transpositions": true, "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": false, "boost": 1 } } },
{ "terms": { "type": ["21"], "boost": 1 } },
{ "match": { "creator": { "query": "0d754a8af3104e978c95eb955f6331be", "operator": "OR", "prefix_length": 0, "fuzzy_transpositions": true, "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": false, "boost": 1 } } },
{ "terms": { "status": ["0","3"], "boost": 1 } },
{ "match": { "isDeleted": { "query": "0", "operator": "OR", "prefix_length": 0, "fuzzy_transpositions": true, "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": false, "boost": 1 } } }
],
"adjust_pure_negative": true,
"boost": 1
}
},
"_source": { "includes": [], "excludes": [] }
}The equivalent SQL representation is:
SELECT guid FROM xxx WHERE source=5 AND type=21 AND creator='0d754a8af3104e978c95eb955f6331be' AND status IN (0,3) AND isDeleted=0;The query contains several inefficiencies: unnecessary fuzzy matching, use of match instead of filter, and, most critically, a data‑type mismatch for numeric fields.
1. Misused Data Types
In Elasticsearch 2.x, numeric fields were indexed as keywords, which made range queries expensive because they were expanded into explicit sets. Later versions introduced a B‑tree‑like structure for integers, but the field isDeleted was still mapped as integer, causing the engine to execute a PointRangeQuery instead of a fast term lookup.
Changing such fields to keyword allows the query to use an inverted index with O(1) lookup, dramatically improving performance.
2. Term Query Order
Elasticsearch does not execute term filters in the order they appear in the request. It first evaluates the filter with the highest selectivity, based on term frequency statistics collected at indexing time. For example, the creator term may return only a few documents, so it is processed first, reducing the workload for subsequent filters.
3. Why PointRangeQuery Is Slow Here
The integer field is internally represented by a block k‑d tree. While this structure accelerates true range queries, it requires building a bitset of matching doc IDs before the advance operation can run. The bitset construction (performed in PointRangeQuery#createWeight) dominates the execution time, which explains the observed ~100 ms latency.
Verification
After converting the problematic integer fields to keyword and rebuilding the index, profiling shows the former 100 ms PointRangeQuery now completes in about 0.5 ms. Latency charts confirm the average shard processing time dropped from >100 ms to under 10 ms, and the slow‑query count fell to zero.
Future Work
Going forward, the platform will enforce keyword mapping for status‑type fields by default, only reverting to integer when a genuine range query is required. Additional challenges include balancing load during index rebuilds, handling uneven cluster resource distribution, and mitigating unpredictable query traffic without provisioning dedicated hardware for each business.
Continued monitoring and incremental optimizations are essential for maintaining low latency and resource efficiency in a shared Elasticsearch environment.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
