Big Data 26 min read

How StarRocks + Paimon Powered Real‑Time Analytics for Alibaba’s Flash Sale

Faced with billions of marketing events and minute‑level decision requirements during Taobao's flash‑sale campaign, the e‑commerce data team built a real‑time lakehouse using StarRocks and Paimon, leveraged asynchronous materialized views and RoaringBitmap deduplication, and achieved sub‑second query latency, massive cost savings, and stable high‑concurrency performance.

StarRocks
StarRocks
StarRocks
How StarRocks + Paimon Powered Real‑Time Analytics for Alibaba’s Flash Sale

Background

When the "first cup of milk tea in autumn" trended, the flash‑sale project at Taobao generated massive traffic and required minute‑level data decisions. The existing T+1 offline pipeline could not meet the real‑time needs of billions of marketing events.

Real‑time Lakehouse Architecture

The data team built a lakehouse using StarRocks as the MPP query engine and Paimon as the streaming‑batch storage layer. Real‑time streams from Flink write to Paimon, while StarRocks queries the lake tables directly, eliminating ETL and achieving sub‑second query latency.

Lakehouse architecture diagram
Lakehouse architecture diagram

Materialized View Optimizations

To handle query spikes, asynchronous materialized views were created to pre‑compute high‑frequency queries, reducing scans of trillions of rows to seconds. Over 120 materialized views were maintained with ~15 min refresh intervals, and resource groups isolated view refresh from online queries.

Layered materialized view architecture (pre‑compute layer, refresh mechanism, resource isolation)

Real‑time dashboards covering traffic funnel, resource usage, and anomaly detection

Partitioned Materialized Views

Partitioned views using PARTITION BY and properties such as partition_ttl_number='30' limited refresh scope and stored 30 days of partitions, allowing fine‑grained control of refresh frequency and resource consumption.

Materialized view diagram
Materialized view diagram

RoaringBitmap for Real‑time Deduplication

RoaringBitmap was introduced to replace traditional COUNT‑DISTINCT. By storing UID bitmaps in Paimon and using StarRocks bitmap functions, the team achieved high‑performance multi‑dimensional UV calculations, funnel analysis, and user‑group drill‑down.

Bitmap union for time‑based aggregation

Bitmap intersection for cohort analysis

Bitmap difference for churn detection

RoaringBitmap illustration
RoaringBitmap illustration

Governance and Operations

Monitoring with Sunfire/Grafana and audit logs (AuditLoader) enabled detection of large queries. Resource groups and compute groups (Mutil Warehouse) provided soft and hard isolation between view refresh and business queries. Best‑practice guidelines were defined for view lifecycle, partitioning, and refresh frequency.

Conclusion and Future Work

The StarRocks + Paimon solution reduced storage cost, cut end‑to‑end latency, and supported high‑concurrency query scenarios. Ongoing work includes further governance of materialized views, performance tuning of StarRocks‑Paimon joins, and expanding lakehouse use cases to eliminate duplicate development across batch and streaming pipelines.

Big DataStarRocksRoaringBitmapPaimonMaterialized Viewslakehouse
StarRocks
Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.