Quantile Computation in Baidu Advertising System: Architecture and Implementation
Baidu’s advertising platform computes high‑precision response‑time quantiles at massive scale by intercepting each API call, locally summarizing data with mergeable T‑Digest histograms, periodically uploading compressed, Base64‑encoded summaries to a warehouse where they are merged on demand, enabling low‑latency, cost‑effective percentile analysis with sub‑0.1% error.
Background
Baidu's advertising business runs on a distributed system that handles billions of interface calls and terabytes of monitoring data daily. Quantile values are highly sensitive to interface performance and are valuable for performance analysis.
What is a quantile?
A quantile is the value at a certain percentile in a data set, e.g., a computer that beats 80% of users means its boot time is at the 20th percentile.
Why quantiles?
In interface performance analysis, extreme requests concentrate above the 99th percentile. Averaging masks these outliers, leading to misleadingly low response times while a small fraction of users experience poor performance.
Common calculation methods
1. Streaming computation : Real‑time data is sent to Spark/Flink clusters. Advantage: 100% accurate quantiles. Disadvantage: importing all data at hundred‑billion scale is resource‑intensive.
2. Offline computation : Data is loaded into a data warehouse and batch‑processed. Not suitable for APM scenarios that require low latency.
3. Load‑test based computation : Tools like JMeter collect samples during load tests and compute quantiles on‑the‑fly. High accuracy and low latency but limited to a single application and test phase.
4. Ad‑hoc query computation : Monitoring tools (e.g., Prometheus) store samples and compute quantiles on demand. Saves compute resources but increases storage cost.
Divide‑and‑conquer architecture
The core challenge is merging quantiles. Because quantiles cannot be simply averaged, the system merges raw data or its summaries (histograms, T‑Digest, GK algorithm, etc.).
Step 1 – Interface interception
Intercept each interface call (via manual instrumentation or Java bytecode enhancement) to capture response time.
Step 2 – Edge aggregation
Each instance maintains a data distribution (summary) for each interface. At fixed intervals (e.g., hourly) the summary is uploaded to a data warehouse and the local summary is cleared. The summary must be mergeable, memory‑efficient, and provide controllable accuracy (T‑Digest is used in practice).
Step 3 – Second‑level aggregation
In the warehouse, summaries from all instances of the same interface are merged on demand to compute the overall quantile for that interface. This enables on‑the‑fly quantile queries without pre‑computing specific percentiles.
Implementation details
Serialization uses little‑endian format for C++‑based warehouses (e.g., Apache Doris).
Compression (GZIP) reduces the size of the binary summary, achieving 30‑40% compression.
Base64 encoding converts binary data to text for easier transmission.
The warehouse table stores columns such as app (application name), method (interface name), quantile_data (the serialized summary), and log_date (record date).
Technical advantages
The architecture is simple, high‑performance, cost‑effective, easy to maintain, and stable. It can handle hundred‑billion‑scale data with ~1 GB daily storage and quantile accuracy around 0.1%.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.