Databases 15 min read

How StarRocks Unified Data Warehouse Simplified Our Multi-Source Advertising Platform

This article explains how the Tianji advertising platform consolidated heterogeneous MySQL, ClickHouse, and TiDB data sources into a single StarRocks data warehouse, addressing data silos, real‑time performance, and query complexity while improving accuracy, latency, and development efficiency.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
How StarRocks Unified Data Warehouse Simplified Our Multi-Source Advertising Platform

Background

The Tianji media buying platform serves UG buying operations and requires rich, accurate in‑app and out‑of‑app data for decision‑making. Data originates from UG data warehouse, media MAPI, and exists in various forms: offline and real‑time, stored in MySQL, ClickHouse (offline), and TiDB (real‑time). The heterogeneity caused challenges for unified analysis, prompting a move to a single StarRocks warehouse.

Pain Points of the Legacy Architecture

Data silos and cross‑source analysis difficulty: Business data in MySQL, metric data in ClickHouse and TiDB, requiring costly in‑memory aggregation for joint queries.

Real‑time vs. query performance trade‑off: TiDB+TiFlash offered real‑time updates but slower queries; ClickHouse offered fast queries but lacked real‑time freshness.

Complex architecture increases development cost: Multiple data sources and syntax differences made integration and maintenance burdensome.

Why StarRocks?

StarRocks provides atomic updates, multi‑table JOIN, massive storage, and fast response, meeting the platform’s need for both accuracy and timeliness better than TiDB.

Unified Data Warehouse Design

Key goals: simplify storage, eliminate redundant data types, and resolve cross‑source query and latency issues.

Business data sync: Real‑time BabelX integration streams MySQL data directly into StarRocks, enabling compute‑storage integration.

Real‑time + offline OLAP data: External real‑time data is written to StarRocks via RCP; offline data from the data lake is synced after batch processing.

Data query applications: All queries now read from StarRocks, unifying syntax and reducing complexity for dashboards and ChatBI.

Real‑time Reporting

Real‑time reports combine media‑side click‑cost data with in‑app revenue to calculate cost ratios. Initial TiDB+TiFlash solutions suffered from high latency (4‑5 s) and complex engine selection. After migrating to StarRocks, query latency dropped to 1‑2 s with unified view support.

Migration Implementation

Migration ensured seamless user experience through data consistency checks, dual‑write, dual‑read (gray release), and final cut‑over phases.

Data Consistency

Real‑time sync via BabelX achieves sub‑second latency; consistency is verified through full, incremental, and scheduled checks. Offline data is validated nightly with scripts.

Migration Process

Dual‑write: Offline data written to both ClickHouse and StarRocks via BabelX and Pilot tools.

Dual‑read (gray release): Parallel reads from both systems with metric comparison to detect discrepancies.

Cut‑over: After validation, all read traffic is switched to StarRocks.

Syntax Differences

Aggregated field handling differs: ClickHouse disallows multiple aggregations on the same field name, while StarRocks requires aggregation for non‑grouped fields.

ORDER BY/LIMIT after UNION ALL applies to different scopes in ClickHouse vs. StarRocks.

StarRocks does not allow temporary calculated fields directly in WHERE clauses; a sub‑query is needed.

Future Plans

Continue performance optimizations with partitioning and materialized views, establish a unified metric management system, and expand AI‑driven analytics for advertisers.

Acknowledgements

Special thanks to the big data team for support on cluster stability and compatibility.

Architecture comparison
Architecture comparison
Unified data flow
Unified data flow
Performance comparison
Performance comparison
SR workflow
SR workflow
Real-time analyticsStarRocksData WarehouseOLAPDatabase Migrationadvertising platform
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.