Databases 11 min read

Key Elasticsearch Performance Tweaks: Cutting Query Latency from 50 ms to Under 1 ms

In a micro‑service that uses Elasticsearch to fetch product listings, a series of targeted optimizations—including shard reduction, segment merging, keyword mapping, request‑cache activation, and PIT‑based sorting—slashed query latency from 50‑60 ms to under 1 ms and boosted throughput to about 50 k queries per second.

Mingyi World Elasticsearch

Dec 5, 2024

Key Elasticsearch Performance Tweaks: Cutting Query Latency from 50 ms to Under 1 ms

Background

Within a micro‑service that retrieves product listings via Elasticsearch, each query originally took 50‑60 ms, creating a bottleneck for high‑volume request handling. Reducing this latency was essential for overall system performance and user experience.

Performance Testing Method

We used an internal load‑testing tool called Ares to evaluate both the micro‑service and the Elasticsearch queries. A representative sample of 10,000 content IDs was selected from production, and an Elasticsearch task was created to stress the index. The test query used was:

{
    "query": {
        "bool": {
            "filter": [
                {"term": {"contentId": 10863010}},
                {"terms": {"storefrontId": ["50","35","36","43","48","49"]}}
            ]
        }
    },
    "_source": ["storefrontId","listingId"],
    "sort": [{"storefrontId": "asc", "listingId": "asc"}]
}

The query filters on a specific contentId and a set of storefrontId values, using a bool filter to limit results to the target market.

Performance Optimization Strategies

3.1 Reduce Shard Count

Initially the cluster contained over 100 shards, leading to inefficient resource usage. We reduced the number of shards to match the number of nodes, which lowered overhead and noticeably improved query speed and cluster stability.

3.2 Limit Segment Count

Accumulating segments increases search latency because Elasticsearch must examine more immutable data units. We applied a segment‑merge policy with the following settings to control and gradually reduce segment numbers:

max_merge_at_once_explicit: "4" – caps explicit merges to four segments at a time.

max_merge_at_once: "4" – limits automatic merges to four segments.

max_merged_segment: "30gb" – prevents creation of overly large segments.

floor_segment: "20gb" – merges any segment smaller than 20 GB first.

segments_per_tier: "2" – restricts each tier to two segments, keeping the total segment count low.

3.3 Type‑Conversion Optimization

We changed the field type used for term queries from its original mapping to keyword, which stores values in the inverted index for extremely fast exact‑match lookups. After re‑indexing all documents, load testing showed the search rate jumping to roughly 50 000 queries per second while latency dropped below 1 ms.

3.4 Enable Request Cache

Enabling request_cache on the index speeds up repeated queries, such as retries or multiple consumptions of the same Kafka event. The cache automatically invalidates on refresh, keeping data near‑real‑time. The setting is applied with:

PUT /your_index_name/_settings
{
  "index": {
    "requests.cache.enable": true
  }
}

While this improves query speed, it can increase memory usage, so it should be balanced against available resources.

3.5 Optimize Sorting with PIT

For queries returning more than 10 000 documents we adopted Point‑In‑Time (PIT) to obtain a consistent snapshot of the index, avoiding interference from ongoing indexing. Instead of sorting by listingId and storefrontId, we sorted by the hidden field _shard_doc, which is unique per document within a PIT context and prevents duplicate or missing pages.

Results

After applying the above optimizations, query latency fell from 50‑60 ms to under 1 ms, and the system sustained about 50 k queries per second. The before‑and‑after cluster performance images illustrate the dramatic improvement.

Conclusion

Targeted Elasticsearch tuning—reducing shard count, managing segment merges, using keyword fields for exact matches, enabling request cache, and leveraging PIT for stable pagination—delivers substantial speed gains and enhances overall system responsiveness.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance optimization Elasticsearch Segment Merging PIT Keyword Mapping Request Cache Shard Reduction

Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.