Operations 6 min read

How I Cut Elasticsearch Query Latency from 5 s to 1.2 s and Saved 60% Storage

This article details a real‑world Elasticsearch performance overhaul on a 12‑billion‑document cluster, covering shard rebalancing, index slimming, JVM tuning, query optimization, safe scaling, monitoring alerts, and data cleanup, complete with formulas, code snippets, and measurable results.

dbaplus Community

Feb 4, 2026

How I Cut Elasticsearch Query Latency from 5 s to 1.2 s and Saved 60% Storage

Background and Problem

A large e‑commerce platform’s Elasticsearch cluster stored 12 billion logs, suffering >5 s query latency, 40% search timeout rate, over 200 daily alerts, and disk I/O at 99%, threatening service stability.

Three Core Optimizations

1. Shard Rebalancing (Performance ↑ 300%)

Cold‑hot data separation was applied, followed by a shard count formula:

total_shards = max(nodes * 1.5, total_data/50GB)

Result: query latency dropped from 5 s to 1.2 s.

2. Index Slimming (Storage ↓ 60%)

Useless fields were disabled, e.g.:

// Original mapping
"user_agent": {"type":"text","fielddata":true}
// Optimized mapping
"user_agent": {"enabled":false}

Time‑series indices were limited with an ILM policy:

PUT _ilm/policy/logs_policy
{
  "hot": {"actions": {"rollover": {"max_size":"50gb"}}},
  "delete": {"min_age":"365d", "actions": {"delete":{}}}
}

3. JVM Tuning (GC Pause ↓ 90%)

Heap size was capped at 31 GB using: heap = min(31GB, physical_memory/2) Full GC frequency fell from three times per hour to once every three days.

Checklist of Pitfalls to Avoid

❌ Single shard > 50 GB → I/O bottleneck

✅ Set index.routing.allocation.total_shards_per_node=3 ❌ HDD + RAID5 → high latency

✅ NVMe SSD + RAID0 with replicas

❌ Full‑scan queries like {"match_all":{},"size":10000} ✅ Use filtered bool queries, limit track_total_hits ❌ Fielddata enabled → heap leaks

✅ Switch to

eager_global_ordinals

Query Optimization (From Full Scan to Lightning Fast)

A problematic wildcard query caused a full‑shard scan:

{"query":{"wildcard":{"product_name":"*爆款*"}}}

The rescue query adds a timeout, a time range filter, and disables total hit tracking:

GET /_search
{
  "timeout":"5s",
  "query":{
    "bool":{
      "filter":[/* cached filters */],
      "must":{"range":{"@timestamp":{"gte":"now-1h"}}}
    }
  },
  "track_total_hits":false
}

Cluster Scaling Without Downtime

During peak traffic, the cluster was expanded with these transient settings:

PUT _cluster/settings
{
  "transient":{
    "cluster.routing.allocation.node_concurrent_recoveries":5,
    "indices.recovery.max_bytes_per_sec":"200mb"
  }
}

Monitoring and Alerting

Alert rules trigger when thread‑pool write queue > 1000 or heap usage > 85%:

IF (thread_pool.write.queue > 1000) OR (jvm.mem.heap_used_percent > 85)
THEN send enterprise‑WeChat + phone alerts

Data Cleanup Verification

To ensure old test data is removed, a delete_by_query was run targeting records older than two years:

POST /test_index*/_delete_by_query
{
  "query":{
    "range":{ "@timestamp": { "lte":"now-2y" } }
  }
}

The operation reduced cluster load by 40% and earned the author a bonus.

Cost Savings

A cost‑comparison table (image) shows the million‑yuan budget saved through these architectural changes.

Conclusion

Effective Elasticsearch optimization relies on thoughtful architecture—rebalancing shards, slimming indices, tuning the JVM, refining queries, scaling safely, and monitoring proactively—rather than merely adding hardware.

Elasticsearch cluster scaling Index Management JVM Optimization

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background and Problem

Three Core Optimizations

1. Shard Rebalancing (Performance ↑ 300%)

2. Index Slimming (Storage ↓ 60%)

3. JVM Tuning (GC Pause ↓ 90%)

Checklist of Pitfalls to Avoid

Query Optimization (From Full Scan to Lightning Fast)

Cluster Scaling Without Downtime

Monitoring and Alerting

Data Cleanup Verification

Cost Savings

Conclusion

dbaplus Community

How this landed with the community

Was this worth your time?

0 Comments

1. Shard Rebalancing (Performance ↑ 300%)

2. Index Slimming (Storage ↓ 60%)

3. JVM Tuning (GC Pause ↓ 90%)