Tencent Tianji Ge Distributed Tracing System: Elasticsearch Optimization Practice
Tencent’s Tianji Ge distributed tracing platform, which combines tracing, metrics and logging for billions of daily records, overcame severe cluster jitter and storage latency by applying tiered Elasticsearch index templates, replica reduction, Transport‑API buffering, pre‑created indices and ILM, cutting write latency from 20 s to 0.32 s, shrinking shards by 70 % and saving 30 % of storage.
This article introduces Tencent's distributed tracing system "Tianji Ge" and its Elasticsearch optimization practices. Tianji Ge is a distributed链路追踪 system that integrates Tracing, Metric, and Logging capabilities, supporting massive-scale service monitoring with peak processing of 34 billion Trace records and 14 billion Log records daily.
The system architecture consists of three layers: Data Access Layer (supporting HTTP+JSON, HTTP+Proto, gRPC protocols with multi-region就近接入), Data Processing Layer (Flink-based stream computing for Metric, Log, and Trace data), and Data Storage Layer (ES for inverted indexes and logs, HBase for call chains and topology data).
Due to rapid business growth, the system faced severe challenges: cluster jitter, resource skew, storage latency spikes, and frequent checkpoint failures. ES optimization efforts included: (1) Implementing tiered index template strategies based on capacity ranges (0-40G: 1 shard, 40-100G: 4 shards, 100-200G: 8 shards, 200-400G: 12 shards); (2) Reducing replica counts and optimizing index settings (removing scoring, DocValues); (3) Upgrading client API to Transport API with 5MB batch buffering; (4) Switching from on-demand to pre-created indices with dedicated master nodes; (5) Implementing ILM (Index Lifecycle Management) with Hot/Warm/Cold/Delete phases for automated index management.
Optimization results: Write latency reduced from 20,000ms to 320ms; Shard count decreased from 72,000 to 20,000 (70% reduction); Storage reduced from 67TB to 48TB (30% savings); Index creation time improved from minutes to seconds.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.