Big Data 13 min read

Real‑time Data Warehouse Architecture and Best Practices in Alibaba Search Recommendation

This article presents Alibaba's search‑recommendation real‑time data warehouse, describing its business background, typical use cases, key requirements, the evolution from architecture 1.0 to 2.0 with Flink and Hologres, best‑practice patterns such as row/column storage, stream‑batch integration, high‑concurrency updates, and future directions like real‑time joins and persistent dimension storage.

DataFunTalk
DataFunTalk
DataFunTalk
Real‑time Data Warehouse Architecture and Best Practices in Alibaba Search Recommendation

Alibaba's e‑commerce search and recommendation systems rely on a real‑time data warehouse that supports dashboards, reports, algorithm training, and A/B testing across multiple platforms such as Taobao and Ele.me.

The data flow starts with user behavior logs captured on mobile devices, passes through offline and real‑time ETL, is stored in an OLAP engine, and then powers analytical and operational applications.

Typical scenarios include real‑time analytics, algorithm serving, and fine‑grained audience operations, each demanding petabyte‑scale storage, billions of rows, high QPS (up to 65 million writes per second), and flexible multi‑dimensional queries.

Based on these demands, six common real‑time warehouse requirements are identified: group‑by cross‑sections, multi‑dimensional filtering, aggregation, A/B testing, key‑based queries, and stream‑batch unified processing.

Architecture 1.0 consisted of three layers—data collection (user logs and dimension tables), Flink‑based stream processing, and a Lightning engine for KV and OLAP queries. Limitations of Lightning included non‑SQL access and shared‑cluster resource contention.

Architecture 2.0 replaced Lightning with Hologres, eliminating the HBase dimension store and unifying both dimension and fact data in Hologres. Flink jobs now read dimension tables directly from Hologres, and all real‑time detail data are written to Hologres, providing high‑concurrency writes and low‑latency SQL queries.

Best‑practice highlights include:

Row‑store tables for key‑value lookups, similar to HBase.

Column‑store tables for OLAP workloads, enabling multi‑dimensional filtering and sub‑second queries even at 5 million rows per second.

Stream‑batch integration via Hologres federated queries that combine real‑time and offline (MaxCompute) data for use cases like promotion target completion and year‑over‑year comparisons.

High‑concurrency updates (up to 500 k updates per second) to support scenarios such as order attribution.

Future work focuses on real‑time table joins to reduce data duplication and improve dimension freshness, and on persisting frequently used dimension results within Hologres.

The presentation concludes with acknowledgments and an invitation to join the DataFunTalk community.

big dataFlinkHologresOLAPreal-time data warehouseStreaming Analytics
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.