Big Data 18 min read

Dual-Engine MOLAP + ROLAP Architecture with Apache Doris for Meituan Takeaway Data Warehouse

Meituan Takeaway’s data warehouse combines Apache Kylin’s MOLAP cubes for stable dimensions with Apache Doris’s MPP‑driven ROLAP engine to handle changing dimensions, detail queries, and near‑real‑time analytics, achieving millisecond‑level responses, reduced storage/compute costs, and simplifying operations across diverse analytical workloads.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Dual-Engine MOLAP + ROLAP Architecture with Apache Doris for Meituan Takeaway Data Warehouse

Meituan Takeaway’s data warehouse adopts a dual‑engine strategy (MOLAP + ROLAP) to satisfy diverse analytical scenarios. The MOLAP layer uses Apache Kylin, while the ROLAP layer is powered by Apache Doris.

Background

Traditional data warehouses built on Hadoop/Spark provide layered storage, but interactive queries still rely on DBMS (MySQL) or OLAP engines (Kylin). Pre‑computed cubes excel at stable dimensions but struggle with rapidly changing dimensions, leading to high recomputation costs.

Challenges

Daily full refresh of historical data eliminates incremental benefits.

Back‑filling billions of rows each day incurs >3 hours of compute time and >1 TB of storage.

Low utilization of pre‑computed data (≈80 % of history queries target the last month).

Inability to query detail data directly.

Solution: MPP‑driven ROLAP

By moving to a massively parallel processing (MPP) engine, Doris provides on‑demand computation for changing dimensions, reducing storage and compute costs while supporting both summary and detail queries.

Comparison of MOLAP and ROLAP

MOLAP requires extensive model preparation, complex configuration, and cannot serve detail queries. ROLAP simplifies model design, supports view‑based business logic, and handles both summary and detail data with lower operational overhead.

Doris Overview

Doris is an MPP‑based OLAP engine integrating Google Mesa data modeling, Apache Impala query engine, and ORC storage format. Its architecture consists of a Front‑End (FE) for query parsing, optimization, and metadata, and Back‑End (BE) nodes for execution and storage.

Key features include:

High‑concurrency point queries and ad‑hoc analytics.

Batch and real‑time data ingestion.

Support for both aggregate and detail queries.

MySQL‑compatible protocol and standard SQL.

Rollup table routing, advanced join strategies, online schema changes, and range/hash partitioning.

Performance Highlights

In a 20 BE + 3 FE cluster, Doris achieves:

Millisecond‑level response for dozens of analytical products.

Second‑level performance for million‑row joins using colocate join.

Second‑level daily aggregation and drill‑down on merchant‑level detail.

2‑3 seconds for 7‑day trend analysis (subject to cluster size).

High reliability and scalability after a year of production use.

Near‑Real‑Time Use Cases

For marketing activities requiring sub‑hour data freshness, Doris powers a Lambda‑style micro‑batch pipeline (Kafka → Doris) delivering 10‑15 minute latency while aligning event‑time and production‑time metrics.

SQL Optimizations in Doris

1. Join Predicate Push‑Down : The predicate t1.id = 1 and t1.id = t2.id allows inference of t2.id = 1, reducing scan volume. select * from t1 join t2 on t1.id = t2.id where t1.id = 1 2. Multi‑Instance Concurrency : Generating multiple execution instances per operator on each node yields 3‑5× speedup.

3. Colocate Join : Data is pre‑sharded by join key, eliminating network shuffle.

4. Bitmap‑Based Precise Distinct : Bitmap aggregation drastically cuts I/O, CPU, and memory for large‑scale distinct counts.

select count(*) FROM A t1 INNER JOIN [shuffle] B t5 ON ((t1.dt = t5.dt) AND (t1.id = t5.id)) INNER JOIN [shuffle] C t6 ON ((t1.dt = t6.dt) AND (t1.id = t6.id)) where t1.dt in (xxx days);

Conclusion

The dual‑engine approach demonstrates that an MPP‑driven ROLAP mode, exemplified by Doris, effectively handles summary + detail workloads, changing dimensions, and near‑real‑time processing, while MOLAP (Kylin) remains valuable for stable, pre‑computed scenarios. The success has spurred broader adoption across Meituan teams and suggests a future where Doris may replace Kylin, Druid, or Elasticsearch for many analytical use cases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataData WarehouseSQL OptimizationApache DorisMOLAPROLAP
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.