How StarRocks + Paimon Powered Real‑Time Analytics for Alibaba’s Taobao Flash Sale
Facing minute‑level decision demands and billions of marketing events during Taobao's Flash Sale, the Ele.me data team built a real‑time lakehouse with StarRocks and Paimon, leveraging asynchronous materialized views, RoaringBitmap de‑duplication, and resource isolation to achieve sub‑second query latency, lower storage costs, and stable high‑concurrency.
Introduction
When the "first cup of milk tea in autumn" trended, the Taobao Flash Sale project launched on April 30, generating massive traffic and requiring minute‑level data decisions. The Ele.me data team built a real‑time lakehouse using StarRocks and Paimon to overcome offline latency bottlenecks.
Background
Traditional T+1 offline pipelines could not meet the minute‑level freshness needed for billions of marketing events. Real‑time data had high development cost and resource consumption, so the team needed a scalable solution.
Technical Solution
Real‑time Lakehouse Architecture
Data flows from Flink into Paimon lake tables, while dimension tables are loaded via Spark. StarRocks queries Paimon tables directly, eliminating ETL and enabling instant dashboards.
Materialized View Optimizations
Asynchronous materialized views pre‑compute high‑frequency queries, reducing query time from minutes to seconds.
RoaringBitmap de‑duplication supports multi‑dimensional real‑time metrics with low storage overhead.
Large‑query management via cluster monitoring, SQL tuning, and resource isolation.
Practical Experience
Three‑stage optimization:
Basic materialized view on Paimon tables refreshed every ten minutes.
Union of real‑time and historical data via DataWorks nightly jobs.
Partitioned materialized views with TTL and auto‑refresh limits to handle long‑term data.
RoaringBitmap Applications
Bitmap‑based de‑duplication replaces count‑distinct, drastically reducing storage and speeding up UV calculations. Use cases include multi‑dimensional pre‑aggregation, time‑based roll‑up, audience drill‑down, churn analysis, and funnel metrics.
Operations & Governance
Monitoring via Sunfire or Grafana, audit logging with AuditLoader, resource groups for query vs. MV refresh isolation, and best‑practice guidelines for view lifecycle, partitioning, and refresh frequency.
Summary & Future Work
The StarRocks + Paimon stack delivered sub‑second latency, lower storage cost, and stable high‑concurrency for the flash‑sale scenario. The team will continue to govern large queries, improve StarRocks‑Paimon integration, and expand lakehouse use cases across Ele.me.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
