Big Data 11 min read

Real-Time Quantile Computation Using TDigest: Architecture and Solutions

The article presents a real‑time quantile solution using the TDigest data structure, which clusters data into centroids and stores digests in Redis or Doris, pre‑computes quantiles for all dimension combinations, and provides a reusable API that delivers fast, accurate, low‑memory quantile statistics for diverse business scenarios.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Real-Time Quantile Computation Using TDigest: Architecture and Solutions

Performance analysis and other scenarios have a strong demand for real‑time quantile statistics. Unlike cumulative duration, quantiles cannot be directly aggregated across dimensions, which poses a significant challenge for real‑time computation.

The article describes three main technical challenges: (1) inability to sort the full data set in streaming contexts, (2) high computational latency of sorting or complex algorithms, and (3) the non‑aggregatable nature of quantile results.

To address these issues, a solution based on the TDigest data structure is presented. TDigest clusters data points into centroids, enabling fast approximate percentile calculation via linear interpolation. The compression parameter controls the trade‑off between accuracy and memory consumption.

Key components of the solution include:

1) Reading raw data from upstream sources. 2) Grouping data according to business dimensions and aggregating them into TDigest instances, which are stored in Redis or merged with existing digests. 3) Extracting quantile values from the TDigest structures and returning them to callers.

The architecture leverages high‑performance storage such as Redis and Doris, and the component is packaged as a reusable API for various business scenarios (kernel performance, search performance, PUSH, etc.).

To overcome the “non‑aggregatable” nature of quantiles, the system pre‑computes quantile values for every possible combination of query dimensions. By enumerating all 2^n dimension combinations (using binary representation where a bit set to 1 denotes the keyword “ALL”), the system stores ready‑made quantile results, allowing instant retrieval without additional aggregation.

Illustrative examples show how three dimensions (app_version, manufacturer, os_version) generate eight possible aggregation keys, and how a FlatMap operator expands a single record into multiple records for each combination before grouping and TDigest aggregation.

The overall workflow achieves high real‑time performance, high accuracy, and low space complexity, making it suitable for a wide range of real‑time quantile use cases.

Big Datastream processingRedisdata aggregationDorisreal-time quantileTDigest
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.