Databases 10 min read

How to Diagnose and Fix Elasticsearch Slow Queries: From PointRange to Keyword

This article examines why Elasticsearch slow queries occur in a shared cluster, analyzes a problematic query's structure and data‑type choices, and demonstrates how converting integer fields to keyword mappings and adjusting filter order can reduce latency from over 100 ms to under 10 ms while eliminating slow‑query alerts.

dbaplus Community
dbaplus Community
dbaplus Community
How to Diagnose and Fix Elasticsearch Slow Queries: From PointRange to Keyword

Problem: Slow Queries

The shared search platform receives many business queries without strict syntax constraints, leading to occasional massive queries that overload the cluster; CPU usage can reach 100 % as shown in the monitoring screenshot.

Slow Query Analysis

Using the Elasticsearch slowlog, a query that consistently exceeds 300 ms was captured. The raw JSON request looks like:

{
  "from": 0,
  "size": 200,
  "timeout": "60s",
  "query": {
    "bool": {
      "must": [
        { "match": { "source": { "query": "5", "operator": "OR", "prefix_length": 0, "fuzzy_transpositions": true, "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": false, "boost": 1 } } },
        { "terms": { "type": ["21"], "boost": 1 } },
        { "match": { "creator": { "query": "0d754a8af3104e978c95eb955f6331be", "operator": "OR", "prefix_length": 0, "fuzzy_transpositions": true, "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": false, "boost": 1 } } },
        { "terms": { "status": ["0","3"], "boost": 1 } },
        { "match": { "isDeleted": { "query": "0", "operator": "OR", "prefix_length": 0, "fuzzy_transpositions": true, "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": false, "boost": 1 } } }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  },
  "_source": { "includes": [], "excludes": [] }
}

The equivalent SQL representation is:

SELECT guid FROM xxx WHERE source=5 AND type=21 AND creator='0d754a8af3104e978c95eb955f6331be' AND status IN (0,3) AND isDeleted=0;

The query contains several inefficiencies: unnecessary fuzzy matching, use of match instead of filter, and, most critically, a data‑type mismatch for numeric fields.

1. Misused Data Types

In Elasticsearch 2.x, numeric fields were indexed as keywords, which made range queries expensive because they were expanded into explicit sets. Later versions introduced a B‑tree‑like structure for integers, but the field isDeleted was still mapped as integer, causing the engine to execute a PointRangeQuery instead of a fast term lookup.

Changing such fields to keyword allows the query to use an inverted index with O(1) lookup, dramatically improving performance.

2. Term Query Order

Elasticsearch does not execute term filters in the order they appear in the request. It first evaluates the filter with the highest selectivity, based on term frequency statistics collected at indexing time. For example, the creator term may return only a few documents, so it is processed first, reducing the workload for subsequent filters.

3. Why PointRangeQuery Is Slow Here

The integer field is internally represented by a block k‑d tree. While this structure accelerates true range queries, it requires building a bitset of matching doc IDs before the advance operation can run. The bitset construction (performed in PointRangeQuery#createWeight) dominates the execution time, which explains the observed ~100 ms latency.

Verification

After converting the problematic integer fields to keyword and rebuilding the index, profiling shows the former 100 ms PointRangeQuery now completes in about 0.5 ms. Latency charts confirm the average shard processing time dropped from >100 ms to under 10 ms, and the slow‑query count fell to zero.

Future Work

Going forward, the platform will enforce keyword mapping for status‑type fields by default, only reverting to integer when a genuine range query is required. Additional challenges include balancing load during index rebuilds, handling uneven cluster resource distribution, and mitigating unpredictable query traffic without provisioning dedicated hardware for each business.

Continued monitoring and incremental optimizations are essential for maintaining low latency and resource efficiency in a shared Elasticsearch environment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationElasticsearchslow-queryIndex MappingPointRangeQuery
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.