Real‑Time Data Warehouse Architecture: Industry Practices and OLAP Engine Selection
This article surveys the recent surge in real‑time data warehouse construction, explains why low‑latency analytics are needed, and compares the architectures and technology stacks of Alibaba, Zhihu, Meituan, and NetEase, highlighting the role of open‑source OLAP engines such as Druid and ADB.
Scenario Description: Real‑time data warehouses have attracted sudden attention this year; the author has previously written and reposted several articles on the topic.
Why Build Real‑Time Data Warehouses: Traditional offline warehouses provide T+1 data with poor timeliness, limiting business users who need near‑real‑time insights; therefore, timeliness is the primary driver.
Alibaba (Cainiao) Design: Uses a layered data model (detail → light aggregation → heavy aggregation), Blink as the compute engine, TianGong for data access, and ADB (formerly ADS) as the real‑time OLAP database, leveraging Kafka, Pulsar, and HBase for ingestion and storage.
Zhihu Design: Evolved through three versions: 1.0 (Spark Streaming), 2.0 (Flink Streaming with a data‑layer), and future streaming‑SQL platform; selects HBase and Redis for real‑time storage and Druid for OLAP queries.
Meituan Design: Four‑layer architecture (ODS, detail, summary, app) with Kafka for ingestion, Cellar (Redis‑like KV store) for high‑frequency dimension data, Elasticsearch for complex queries, and Druid as the primary OLAP engine.
NetEase Yanxuan Design: Layered pipeline from Kafka ingestion to Flink processing, storing results in Redis, HBase, MySQL, Greenplum; uses Greenplum, HBase, Redis, and MySQL for various storage needs.
Summary: Across the industry, real‑time data warehouses rely on streaming frameworks (Spark, Flink), message queues (Kafka dominates), and a mix of storage solutions (HBase, Redis, MySQL) while open‑source OLAP engines like Druid, Presto, ClickHouse, and ADB provide the analytical layer.
References: https://yq.aliyun.com/articles/691541, https://dwz.cn/qwcuWD4L, https://tech.meituan.com/2018/10/18/meishi-data-flink.html, http://lxw1234.com/archives/2017/07/867.html, https://www.codercto.com/a/47662.html.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
