Big Data 11 min read

Quantile Computation in Baidu Advertising System: Architecture and Implementation

Baidu’s advertising platform computes high‑precision response‑time quantiles at massive scale by intercepting each API call, locally summarizing data with mergeable T‑Digest histograms, periodically uploading compressed, Base64‑encoded summaries to a warehouse where they are merged on demand, enabling low‑latency, cost‑effective percentile analysis with sub‑0.1% error.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Quantile Computation in Baidu Advertising System: Architecture and Implementation

Background

Baidu's advertising business runs on a distributed system that handles billions of interface calls and terabytes of monitoring data daily. Quantile values are highly sensitive to interface performance and are valuable for performance analysis.

What is a quantile?

A quantile is the value at a certain percentile in a data set, e.g., a computer that beats 80% of users means its boot time is at the 20th percentile.

Why quantiles?

In interface performance analysis, extreme requests concentrate above the 99th percentile. Averaging masks these outliers, leading to misleadingly low response times while a small fraction of users experience poor performance.

Common calculation methods

1. Streaming computation : Real‑time data is sent to Spark/Flink clusters. Advantage: 100% accurate quantiles. Disadvantage: importing all data at hundred‑billion scale is resource‑intensive.

2. Offline computation : Data is loaded into a data warehouse and batch‑processed. Not suitable for APM scenarios that require low latency.

3. Load‑test based computation : Tools like JMeter collect samples during load tests and compute quantiles on‑the‑fly. High accuracy and low latency but limited to a single application and test phase.

4. Ad‑hoc query computation : Monitoring tools (e.g., Prometheus) store samples and compute quantiles on demand. Saves compute resources but increases storage cost.

Divide‑and‑conquer architecture

The core challenge is merging quantiles. Because quantiles cannot be simply averaged, the system merges raw data or its summaries (histograms, T‑Digest, GK algorithm, etc.).

Step 1 – Interface interception

Intercept each interface call (via manual instrumentation or Java bytecode enhancement) to capture response time.

Step 2 – Edge aggregation

Each instance maintains a data distribution (summary) for each interface. At fixed intervals (e.g., hourly) the summary is uploaded to a data warehouse and the local summary is cleared. The summary must be mergeable, memory‑efficient, and provide controllable accuracy (T‑Digest is used in practice).

Step 3 – Second‑level aggregation

In the warehouse, summaries from all instances of the same interface are merged on demand to compute the overall quantile for that interface. This enables on‑the‑fly quantile queries without pre‑computing specific percentiles.

Implementation details

Serialization uses little‑endian format for C++‑based warehouses (e.g., Apache Doris).

Compression (GZIP) reduces the size of the binary summary, achieving 30‑40% compression.

Base64 encoding converts binary data to text for easier transmission.

The warehouse table stores columns such as app (application name), method (interface name), quantile_data (the serialized summary), and log_date (record date).

Technical advantages

The architecture is simple, high‑performance, cost‑effective, easy to maintain, and stable. It can handle hundred‑billion‑scale data with ~1 GB daily storage and quantile accuracy around 0.1%.

distributed systemsMonitoringBig Datadata aggregationquantileT-Digest
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.