How Paimon + StarRocks Power Real‑Time OLAP for Double‑11 Mega‑Sales
During Double‑11 mega‑sales, Taobao Group faced exploding OLAP query traffic, costly data sync pipelines, and slow near‑real‑time analytics, so they unified real‑time and batch data in Paimon, leveraged StarRocks for high‑performance lake queries, tuned cluster settings, and saved nearly ten‑million yuan annually while cutting refresh latency by 80%.
Background and Business Pain Points
In large‑scale promotional events such as Double‑11, the window after a sale launches sees a sudden surge of BI queries, dramatically increasing request volume and stressing the query engine. The existing architecture relied on separate real‑time (TT) and offline (ODPS) storage, with additional sync to Holo for acceleration, leading to fragmented storage, high sync costs, and complex end‑to‑end data pipelines.
Business teams also demanded near‑real‑time data for fast decision making, exposing bottlenecks in the traditional "multiple storage, multiple refresh" model.
Architecture Evolution: Paimon + StarRocks
The team introduced a unified lake storage layer using Paimon , where both real‑time and batch data are ingested into the same tables. StarRocks can query these lake tables directly, eliminating the need for separate sync pipelines and reducing storage duplication.
Key benefits:
Unified storage removes data sync links and cuts multi‑copy costs.
StarRocks reads lake data with high performance, supporting both point‑lookup and heavy‑shuffle OLAP workloads.
Analysts can self‑serve near‑real‑time data via StarRocks without involving data engineers.
Core Strategies
1) Architecture simplification : Store all data in Paimon, letting StarRocks perform high‑performance analysis directly on lake tables.
2) Lowering usage barriers : Paimon provides explicit schemas, so BI tools can consume structured data without extra deserialization.
Additionally, a wide table with primary‑key based partial updates enables analysts to retrieve all states of an entity (e.g., order) with a single row read.
Operational Optimizations for StarRocks
To ensure stability during traffic spikes, the following cluster‑level safeguards were applied:
Query cache window of 180 seconds to reuse identical queries.
Global query timeout of 30 seconds; slow queries are aborted.
Isolation of read‑only instances by business importance (default, high‑availability, BI‑dedicated warehouses).
Critical configuration parameters set via SQL:
set global cbo_cte_reuse_rate=0; set global query_timeout=30; set global new_planner_optimize_timeout=10000; set global pipeline_dop=8; set global scan_paimon_partition_num_limit=100;Additional best practices include enabling broadcast joins for small dimension tables, ensuring compute and storage reside in the same region, and turning on deletion vectors for primary‑key tables to skip deleted rows.
Monitoring and Alerting
Core metrics (CPU, memory, I/O, query latency) are visualized on dashboards. Alerts cover CPU/memory usage >70 %, node availability <100 %, BE queue length >2000, and query failure or latency percentiles exceeding thresholds.
Audit logs are stored in the internal _starrocks_audit_db_ table, enabling real‑time metadata monitoring and targeted "bad SQL" remediation.
Performance Testing and Findings
Two‑stage stress testing was performed:
Page‑level tests on core promotional pages.
Full‑link tests simulating all pages hitting peak traffic simultaneously.
Key issues discovered and mitigations:
Partition pruning failures caused full‑table scans – enforce proper partition filters.
Excessive small files in Paimon increased read‑block count – apply sorting on offline branches and compact tasks for ODPS‑written tables.
Missing broadcast hints for small dimension tables – add [broadcast] to achieve up to 3× speedup.
Cross‑region data access increased latency – co‑locate StarRocks and Paimon.
Enable deletion vectors on primary‑key tables to skip stale data.
Results and Future Roadmap
The new architecture delivered four major outcomes:
Simplified data pipeline and eliminated sync costs.
Significantly lowered the barrier for analysts to access near‑real‑time data.
Reduced refresh workload by ~80 %, saving roughly ten‑million yuan annually.
Low‑cost solution for cross‑day real‑time UV calculation, meeting near‑real‑time decision needs.
Future plans focus on enhancing StarRocks with automatic materialization, richer metadata, better scheduler CPU balancing, and direct reading of Fluss (second‑level real‑time) streams.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
