Databases 18 min read

Tencent’s Secrets to Scaling Elasticsearch for Trillion‑Level Data

Tencent shares how it leverages Elasticsearch at trillion‑scale across logging, search, and time‑series workloads, detailing the challenges of high availability, low cost, and high performance, and describing concrete kernel‑level optimizations, resource‑limiting strategies, storage tiering, rollup, cache, and merge techniques that enable robust, efficient operation.

dbaplus Community
dbaplus Community
dbaplus Community
Tencent’s Secrets to Scaling Elasticsearch for Trillion‑Level Data

1. Elasticsearch Use Cases at Tencent

Elasticsearch (ES) is employed extensively within Tencent for real‑time log analysis, full‑text search, and structured data analytics, covering billions of documents and supporting diverse workloads such as operational logs, business click streams, audit logs, e‑commerce product search, app store search, forum search, metrics monitoring, APM, and IoT sensor data.

Key advantages include a complete Elastic ecosystem for rapid deployment, sub‑10‑second latency from ingestion to query, flexible inverted‑index and column‑store structures, and interactive analysis that remains responsive even at trillion‑scale.

ES application landscape
ES application landscape

2. Challenges Faced

Operating ES at massive scale introduces two major challenge groups:

Search‑oriented workloads : Require four‑nine (99.99%) availability, >100k QPS, 20 ms average latency, and P95 < 100 ms, demanding both high availability and high performance.

Time‑series workloads : Emphasize low storage and compute cost; write throughput can exceed 10 M writes/s, data retention reaches petabytes, and cost per GB becomes a critical factor.

Both categories stress cluster robustness, fault tolerance, and resource efficiency.

Challenge illustration
Challenge illustration

3. Optimization Practices

3.1 High‑Availability Enhancements

We address three dimensions:

System robustness : Improve fault tolerance under abnormal queries and overload, enhance cluster scalability, and balance data across nodes and disks during expansion.

Disaster‑recovery : Implement multi‑AZ deployment, rapid failover, and backup‑restore to cheap storage.

System defects : Fix issues such as master node blockage, distributed deadlocks, and slow rolling restarts (e.g., PR ES-46520).

Specific measures include service‑level rate limiting, metadata control improvements that enable thousand‑node clusters with millions of shards, and optimized shard balancing across disks.

High‑availability diagram
High‑availability diagram

3.2 Cost Optimization

Cost analysis shows a resource ratio of roughly 8:4:1 for disk:memory:CPU in typical log/monitoring workloads. To reduce storage cost, we adopt hot‑cold data separation, hybrid storage, and aggressive lifecycle management. For memory, we introduce an off‑heap LFU cache that stores frequently accessed index data, raising memory utilization by 80 % while keeping GC overhead down 30 %.

We also leverage Rollup (pre‑aggregation) to replace raw high‑granularity data with summarized metrics, cutting storage by up to 90 % and improving query speed. Implementation uses a gradient‑based bucket‑limit algorithm that checks JVM memory every 1,000 buckets and aborts when necessary ( PR ES-46751, PR ES-47806).

Cost optimization architecture
Cost optimization architecture

3.3 Performance Tuning

Write path improvements include primary‑key deduplication using index pruning (+45 % throughput, PR Lucene-8980) and optimized translog refresh to better utilize CPU (+20 % throughput, PR ES-45765, PR ES-47790). We are also exploring vectorized execution for further gains.

Query side optimizations consist of merge‑strategy redesign (time‑aware merge for time‑series data), segment pruning using min/max indices (+30 % query speed), and CBO‑driven cache avoidance to eliminate 10× latency spikes ( PR Lucene-9002). Hardware accelerators such as Intel AEP, Optane, and QAT are being evaluated.

Additionally, we introduced a “cold‑data auto‑merge” that consolidates inactive shards to ~5 GB, reducing file count and improving time‑series query pruning.

Performance tuning diagram
Performance tuning diagram

4. Future Plans and Open‑Source Contributions

In the past six months Tencent has submitted over ten pull‑requests to the Elastic codebase, covering ingestion, query, and cluster management modules. Internally we have formed an open‑source collaboration team to co‑develop the Elastic ecosystem.

Looking ahead, we aim to deepen integration of ES with OLAP engines and online services, explore further cost‑effective storage tiers, and continue enhancing kernel features such as time‑aware merge, automated backup, and resource isolation for multi‑tenant environments.

Future roadmap
Future roadmap
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Elasticsearchhigh availabilityCost Optimizationopen sourceLarge‑Scale SearchTime‑Series Data
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.