Practical Experience of StarRocks Materialized Views at Didi
This article details Didi's evolution of OLAP systems, the adoption of StarRocks for high‑performance MPP analytics, and how materialized views, global dictionary mapping, and transparent acceleration were engineered to boost real‑time dashboard queries while outlining performance gains, challenges, and future optimization plans.
Background: Didi's OLAP development progressed from Druid, Kylin, and Presto to ClickHouse, encountering performance, stability, usability, and maintenance challenges as business complexity grew.
In 2022 Didi introduced StarRocks, a next‑generation, full‑scene MPP database featuring a vectorized pipeline engine, cost‑based optimizer, and intelligent materialized views, enabling real‑time updates and high‑concurrency data analysis.
StarRocks offers a simple distributed architecture with FE and BE roles, columnar storage, vectorized execution, strong query performance, support for various table models (detail, aggregate, primary‑key) and data types such as HyperLogLog and BITMAP, easy management, and native lake‑warehouse capabilities.
By May 2023 Didi operated over 30 StarRocks clusters, storing more than 300 TB of data and handling over 4 million daily queries across nearly all business lines.
The data pipeline combines offline SparkLoad and real‑time Flink StarRocks connector ingestion; a global dictionary maps high‑cardinality strings to auto‑increment UINT64 IDs, allowing efficient BITMAP aggregation for distinct‑count calculations.
Materialized view strategy includes synchronous MV for low‑latency aggregation and asynchronous MV refreshed periodically for heavy queries, with a transparent acceleration feature that automatically rewrites queries to the appropriate view without altering SQL semantics.
Performance outcomes show query latency reduced by about 80 %, resource consumption cut by roughly 95 %, and QPS capacity increased by orders of magnitude, while acknowledging drawbacks such as complex view maintenance, refresh overhead, and eventual consistency limitations.
Future directions focus on enhancing BITMAP computation speed, reducing optimizer overhead for async MV selection, and automating MV creation based on high‑frequency query patterns to further improve performance and usability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
