Databases 16 min read

Practical Experience of StarRocks Materialized Views at Didi

This article details Didi's evolution of OLAP systems, the adoption of StarRocks for high‑performance MPP analytics, and how materialized views, global dictionary mapping, and transparent acceleration were engineered to boost real‑time dashboard queries while outlining performance gains, challenges, and future optimization plans.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Practical Experience of StarRocks Materialized Views at Didi

Background: Didi's OLAP development progressed from Druid, Kylin, and Presto to ClickHouse, encountering performance, stability, usability, and maintenance challenges as business complexity grew.

In 2022 Didi introduced StarRocks, a next‑generation, full‑scene MPP database featuring a vectorized pipeline engine, cost‑based optimizer, and intelligent materialized views, enabling real‑time updates and high‑concurrency data analysis.

StarRocks offers a simple distributed architecture with FE and BE roles, columnar storage, vectorized execution, strong query performance, support for various table models (detail, aggregate, primary‑key) and data types such as HyperLogLog and BITMAP, easy management, and native lake‑warehouse capabilities.

By May 2023 Didi operated over 30 StarRocks clusters, storing more than 300 TB of data and handling over 4 million daily queries across nearly all business lines.

The data pipeline combines offline SparkLoad and real‑time Flink StarRocks connector ingestion; a global dictionary maps high‑cardinality strings to auto‑increment UINT64 IDs, allowing efficient BITMAP aggregation for distinct‑count calculations.

Materialized view strategy includes synchronous MV for low‑latency aggregation and asynchronous MV refreshed periodically for heavy queries, with a transparent acceleration feature that automatically rewrites queries to the appropriate view without altering SQL semantics.

Performance outcomes show query latency reduced by about 80 %, resource consumption cut by roughly 95 %, and QPS capacity increased by orders of magnitude, while acknowledging drawbacks such as complex view maintenance, refresh overhead, and eventual consistency limitations.

Future directions focus on enhancing BITMAP computation speed, reducing optimizer overhead for async MV selection, and automating MV creation based on high‑frequency query patterns to further improve performance and usability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataStarRocksOLAPDidimaterialized viewReal-time Dashboard
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.