Big Data 25 min read

How StarRocks + Paimon Powered Real‑Time Analytics for Alibaba’s Taobao Flash Sale

Facing minute‑level decision demands and billions of marketing events during Taobao's Flash Sale, the Ele.me data team built a real‑time lakehouse with StarRocks and Paimon, leveraging asynchronous materialized views, RoaringBitmap de‑duplication, and resource isolation to achieve sub‑second query latency, lower storage costs, and stable high‑concurrency.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How StarRocks + Paimon Powered Real‑Time Analytics for Alibaba’s Taobao Flash Sale

Introduction

When the "first cup of milk tea in autumn" trended, the Taobao Flash Sale project launched on April 30, generating massive traffic and requiring minute‑level data decisions. The Ele.me data team built a real‑time lakehouse using StarRocks and Paimon to overcome offline latency bottlenecks.

Background

Traditional T+1 offline pipelines could not meet the minute‑level freshness needed for billions of marketing events. Real‑time data had high development cost and resource consumption, so the team needed a scalable solution.

Technical Solution

Real‑time Lakehouse Architecture

Data flows from Flink into Paimon lake tables, while dimension tables are loaded via Spark. StarRocks queries Paimon tables directly, eliminating ETL and enabling instant dashboards.

Materialized View Optimizations

Asynchronous materialized views pre‑compute high‑frequency queries, reducing query time from minutes to seconds.

RoaringBitmap de‑duplication supports multi‑dimensional real‑time metrics with low storage overhead.

Large‑query management via cluster monitoring, SQL tuning, and resource isolation.

Practical Experience

Three‑stage optimization:

Basic materialized view on Paimon tables refreshed every ten minutes.

Union of real‑time and historical data via DataWorks nightly jobs.

Partitioned materialized views with TTL and auto‑refresh limits to handle long‑term data.

RoaringBitmap Applications

Bitmap‑based de‑duplication replaces count‑distinct, drastically reducing storage and speeding up UV calculations. Use cases include multi‑dimensional pre‑aggregation, time‑based roll‑up, audience drill‑down, churn analysis, and funnel metrics.

Operations & Governance

Monitoring via Sunfire or Grafana, audit logging with AuditLoader, resource groups for query vs. MV refresh isolation, and best‑practice guidelines for view lifecycle, partitioning, and refresh frequency.

Summary & Future Work

The StarRocks + Paimon stack delivered sub‑second latency, lower storage cost, and stable high‑concurrency for the flash‑sale scenario. The team will continue to govern large queries, improve StarRocks‑Paimon integration, and expand lakehouse use cases across Ele.me.

Real-time analyticsStarRocksRoaringBitmapPaimonMaterialized Viewslakehouse
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.