How StarRocks Unified Data Warehouse Simplified Our Multi-Source Advertising Platform
This article explains how the Tianji advertising platform consolidated heterogeneous MySQL, ClickHouse, and TiDB data sources into a single StarRocks data warehouse, addressing data silos, real‑time performance, and query complexity while improving accuracy, latency, and development efficiency.
Background
The Tianji media buying platform serves UG buying operations and requires rich, accurate in‑app and out‑of‑app data for decision‑making. Data originates from UG data warehouse, media MAPI, and exists in various forms: offline and real‑time, stored in MySQL, ClickHouse (offline), and TiDB (real‑time). The heterogeneity caused challenges for unified analysis, prompting a move to a single StarRocks warehouse.
Pain Points of the Legacy Architecture
Data silos and cross‑source analysis difficulty: Business data in MySQL, metric data in ClickHouse and TiDB, requiring costly in‑memory aggregation for joint queries.
Real‑time vs. query performance trade‑off: TiDB+TiFlash offered real‑time updates but slower queries; ClickHouse offered fast queries but lacked real‑time freshness.
Complex architecture increases development cost: Multiple data sources and syntax differences made integration and maintenance burdensome.
Why StarRocks?
StarRocks provides atomic updates, multi‑table JOIN, massive storage, and fast response, meeting the platform’s need for both accuracy and timeliness better than TiDB.
Unified Data Warehouse Design
Key goals: simplify storage, eliminate redundant data types, and resolve cross‑source query and latency issues.
Business data sync: Real‑time BabelX integration streams MySQL data directly into StarRocks, enabling compute‑storage integration.
Real‑time + offline OLAP data: External real‑time data is written to StarRocks via RCP; offline data from the data lake is synced after batch processing.
Data query applications: All queries now read from StarRocks, unifying syntax and reducing complexity for dashboards and ChatBI.
Real‑time Reporting
Real‑time reports combine media‑side click‑cost data with in‑app revenue to calculate cost ratios. Initial TiDB+TiFlash solutions suffered from high latency (4‑5 s) and complex engine selection. After migrating to StarRocks, query latency dropped to 1‑2 s with unified view support.
Migration Implementation
Migration ensured seamless user experience through data consistency checks, dual‑write, dual‑read (gray release), and final cut‑over phases.
Data Consistency
Real‑time sync via BabelX achieves sub‑second latency; consistency is verified through full, incremental, and scheduled checks. Offline data is validated nightly with scripts.
Migration Process
Dual‑write: Offline data written to both ClickHouse and StarRocks via BabelX and Pilot tools.
Dual‑read (gray release): Parallel reads from both systems with metric comparison to detect discrepancies.
Cut‑over: After validation, all read traffic is switched to StarRocks.
Syntax Differences
Aggregated field handling differs: ClickHouse disallows multiple aggregations on the same field name, while StarRocks requires aggregation for non‑grouped fields.
ORDER BY/LIMIT after UNION ALL applies to different scopes in ClickHouse vs. StarRocks.
StarRocks does not allow temporary calculated fields directly in WHERE clauses; a sub‑query is needed.
Future Plans
Continue performance optimizations with partitioning and materialized views, establish a unified metric management system, and expand AI‑driven analytics for advertisers.
Acknowledgements
Special thanks to the big data team for support on cluster stability and compatibility.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
