How Didi Scaled Real‑Time Dashboards with StarRocks Materialized Views
This article details Didi's evolution from a multi‑engine OLAP stack to a unified StarRocks solution, explains the design of global dictionaries and materialized views for real‑time dashboard acceleration, and shares performance results, challenges, and future optimization directions.
Background and Motivation
Didi's OLAP platform originally combined Druid, ClickHouse, Kylin and Presto to serve monitoring, log analysis, offline acceleration and real‑time data‑warehouse scenarios. As business volume grew, the heterogeneous stack suffered from maintenance complexity, inconsistent performance, limited scalability and high operational cost.
In 2020 Didi introduced ClickHouse, which offered columnar storage and vectorized execution, but still faced issues such as difficult maintenance of five different engines, lack of support for mutable operations, and insufficient stability under high‑concurrency workloads.
Why StarRocks?
Around 2022 Didi evaluated StarRocks, a next‑generation MPP database with vectorized pipelines, cost‑based optimizer, and built‑in materialized‑view support. StarRocks provides a simple two‑role architecture (FE/BE), automatic data balancing, columnar storage, and strong support for high‑cardinality distinct counting via BITMAP and HyperLogLog.
Key advantages include:
Simplified distributed architecture without external dependencies.
Superior query performance for aggregation and join workloads.
Rich data‑type support (e.g., BITMAP, HyperLogLog) for deduplication.
Native lake‑house capabilities allowing seamless federation with Hive, Iceberg, Hudi.
Deployment Scale
By May 2023 Didi operated more than 30 StarRocks clusters, storing over 300 TB of data, serving more than 4 million daily queries across 1 500+ tables for virtually all business lines (ride‑hailing, bike‑sharing, logistics, etc.).
Real‑Time Dashboard Acceleration with Materialized Views
The core use case is Didi's ride‑hailing real‑time dashboard, which tracks dozens of KPIs (calls, bubbles, GMV, etc.) with sub‑second latency. The legacy Druid‑based dashboard suffered from fuzzy distinct counting, high concurrency, and unstable performance during traffic spikes.
StarRocks materialized views were introduced to pre‑aggregate metrics at various time granularities (1 min, 5 min, 30 min) and to replace expensive distinct‑count operations with BITMAP aggregation.
Global Dictionary for High‑Cardinality Columns
To accelerate count‑distinct, Didi built a global dictionary that maps raw string keys to auto‑incrementing UINT64 IDs using StarRocks' partial‑update feature. The workflow:
Store the dictionary in a primary‑key table; the primary key is the original string, the auto‑increment column provides a compact ID.
Expose a dict_mapping(dict_table, pk) function that looks up the ID at query time.
Modify the Flink‑StarRocks connector to write dictionary entries first, then write fact rows with the mapped ID, ensuring idempotent ingestion.
Design of Synchronous and Asynchronous Views
Data flows from the warehouse to Flink, then to StarRocks where:
ODS layer stores raw fact tables.
DWD layer creates synchronous materialized views that roll up high‑cardinality columns using BITMAP.
ADS layer hosts asynchronous views refreshed on a schedule, providing query‑time cache‑like performance.
Because count‑distinct cannot be additive, Didi derived a combinatorial view count of 2ⁿ for n dimensions. To avoid exponential explosion, they identified “additive” dimensions (e.g., city) and only materialized views for non‑additive dimensions, reducing the view count to 2ⁿ⁻ᵐ where m is the number of additive dimensions.
Transparent Query Rewrite
StarRocks can automatically rewrite incoming SQL to the most appropriate materialized view without user intervention, preserving query semantics while delivering orders‑of‑magnitude speedups.
Example scenarios:
Case 1: No WHERE clause → hits the 1‑minute view.
Case 2: Filter on city → hits the 5‑minute view.
Case 3: Filter on business line → hits the extended view with additional additive dimension.
Case 4: Multi‑business‑line filter → falls back to synchronous view or raw table.
Results and Impact
Single‑query latency reduced by ~80%.
Resource consumption cut by ~95%.
Cluster QPS capacity increased by up to 100×.
Limitations and Future Work
Current drawbacks include complex view management, unnecessary refresh cycles for asynchronous views, and eventual consistency gaps between async views and base tables.
Planned improvements:
Optimize BITMAP bucket strategy and use fastunion to speed up bitmap merges.
Reduce optimizer overhead for view rewrite (currently ~500 ms of a 1 s query).
Automate materialized‑view creation by mining frequent query patterns, lowering manual effort.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
