Real‑Time Lakehouse Architecture at Ximalaya Live: Leveraging Flink, Paimon, and StarRocks
This article details Ximalaya Live's transition from an offline‑centric data warehouse to a real‑time lakehouse using Flink, Paimon, and StarRocks, covering business background, architectural challenges, technology evaluation, implementation steps, encountered issues, performance gains, and future expansion plans.
Ximalaya Live operates audio live, video live, and multi‑person entertainment halls, generating massive streaming data that requires timely monitoring, ranking, traffic analysis, and profit‑loss alerts.
The original data warehouse followed a traditional ODS‑DWD‑DWS‑ADM layered model, relying on batch Spark jobs (T+1) and Hive, which could not meet the growing demand for minute‑level insights.
To achieve real‑time capabilities, the team evaluated several lake solutions (Delta Lake, Hudi) and OLAP engines (ClickHouse, StarRocks). Delta Lake performed poorly with Flink CDC, Hudi introduced high operational complexity, while Paimon offered better performance, lower development cost, and active community support.
After extensive comparison, the final stack was chosen as Flink + Paimon + StarRocks. Flink CDC streams data from MySQL and log sources into Paimon (acting as the ODS layer). Paimon handles table‑level joins and writes results to StarRocks, which provides materialized views and fast OLAP queries for downstream applications.
In the offline path, Paimon also serves as the ODS layer, using primary‑key updates to eliminate long‑running batch delays and improve data freshness.
During the two‑month rollout, several issues surfaced: slow stream‑join performance, missing data due to non‑persistent dimension tables, and occasional data loss in union operations. These were resolved by limiting incremental reads, using Lookup tables for dimensions, and applying community‑provided bug fixes.
The real‑time lakehouse delivered four major benefits: minute‑level revenue monitoring, instant ranking generation, real‑time traffic (DAU/eDAU) tracking, and rapid profit‑loss alerts, all of which enhanced operational efficiency and decision quality.
Looking ahead, the architecture will be extended to advertising and order‑management domains, deeper collaboration with the Paimon and StarRocks communities will continue, and the platform will support upcoming AI projects that demand high‑frequency data.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
