Big Data 15 min read

Optimizing OLAP Performance with ADM, Cube Pre‑aggregation and Sampling at Ant Group

This article explains how Ant Group tackles the performance challenges of large‑scale OLAP tables by using ADM to reduce data volume, employing Cube pre‑aggregation for reporting, and applying statistical sampling for exploratory analysis, detailing the processes, metrics, and architectural designs involved.

DataFunSummit
DataFunSummit
DataFunSummit
Optimizing OLAP Performance with ADM, Cube Pre‑aggregation and Sampling at Ant Group

In the era of big data, the explosive growth of data volumes makes rapid extraction of valuable information a critical challenge for database systems. Ant Group addresses this in enterprise OLAP scenarios by combining ADM (Application Data Model) with OLAP engines to improve performance.

The article first explains why ADM is needed to solve large‑table performance issues in OLAP, describing the typical data flow from OLTP to offline processing, dimensional modeling, and the creation of ADM layers that reduce query data size and complexity.

It then outlines the end‑to‑end reporting workflow, highlighting the high cost of ADM development (e.g., 1.5 days per report, with ADM consuming the majority of effort) and the need for more efficient solutions.

The first solution presented is Cube pre‑aggregation. By extracting dimensions and measures from report configurations, Cube automatically materializes aggregated data, shrinking query data from billions to millions of rows and accelerating query time from minutes to seconds. The article describes the Cube system’s two‑stage execution (build side and query side), core metrics (slow‑query coverage, hit rate, utilization), and ways to improve coverage and utilization through report‑driven Cube definitions and usage‑based pruning.

Next, the article discusses sampling as a technique for exploratory analysis, where flexible metrics and dimensions allow approximate results. It explains the statistical foundations (Bernoulli distribution, central limit theorem), error sources such as COUNTD distortion, and how hash‑bucket sampling with weighted reconstruction can mitigate these errors.

The piece also covers variable‑rate sampling for different dimension cardinalities, ensuring that confidence‑level error bounds (e.g., 95% confidence with ≤3% error) are met by adjusting sampling ratios per group.

Finally, the technical architectures for both Cube and sampling are presented, showing how materialization discovery, build, and identification stages integrate with monitoring and governance components. The article concludes with an introduction to the DeepInsight BI team behind these innovations.

performance optimizationBig DataData WarehouseOLAPsamplingCubeADM
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.