Big Data 18 min read

How Meituan Built a Scalable Real‑Time Data Warehouse: Architecture & Lessons

Meituan Waimai’s data intelligence team outlines a universal real‑time data‑warehouse methodology that combines a production platform with an interactive analytics engine, detailing scenarios, technology choices, architectural designs, platformization, SLA management, and a practical Lambda‑style case study.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
How Meituan Built a Scalable Real‑Time Data Warehouse: Architecture & Lessons

Meituan Waimai’s data intelligence group shares a comprehensive, reusable approach to building a real‑time data warehouse that balances low latency, SQL standardization, rapid change response, and unified data handling.

01 Real‑time Scenarios

Real‑time data is applied across multiple dimensions in Meituan Waimai:

Operation layer : real‑time business changes, marketing effectiveness, daily revenue, and time‑slice trend analysis.

Production layer : system reliability, stability, and health monitoring of real‑time services.

C‑end user : search and recommendation ranking that requires instant feature generation from user behavior.

Risk control : real‑time fraud detection, risk identification, and abnormal transaction monitoring.

02 Real‑time Technology and Architecture

1. Real‑time Compute Technology Selection

Open‑source stream processing options include Storm , Spark Streaming , and Flink . Meituan initially adopted Storm for its stability and scalability, but as Flink matured it became the preferred engine due to better performance and design advantages. The migration from Storm to Flink is ongoing, with legacy jobs still running on Storm.

2. Real‑time Architecture

① Lambda Architecture – a classic dual‑pipeline design that adds a real‑time processing chain to an existing batch system. While it separates batch and stream paths, it doubles development, operations, and resource footprints.

② Kappa Architecture – a simplified design that unifies batch and stream processing into a single pipeline. Although conceptually clean, real‑world Kappa deployments are rare and often limited to narrow use cases.

03 Business Pain Points

Early development followed a case‑by‑case approach, embedding business logic directly into data pipelines. As the number of real‑time jobs grew, duplicated data ingestion, repeated cleaning/expansion steps, and uncontrolled resource consumption became major bottlenecks, making unified management essential.

04 Data Characteristics and Application Scenarios

Meituan distinguishes two primary data categories:

Log‑type data : massive, semi‑structured, deeply nested logs (user, DB, server). They are immutable once generated and are used for monitoring and real‑time user‑behavior analytics with short windows (5–10 minutes).

Business‑type data : transactional Binlog streams that are highly structured but require extensive table joins, leading to an n‑to‑1 integration challenge.

Key challenges for business‑type streams include multi‑state lifecycle handling, complex data integration, and the need to process batches of records while performing stream‑level transformations.

05 Real‑time Data Warehouse Architecture Design

1. Stream‑Batch Hybrid Exploration

To address diverse requirements, Meituan adopts a hybrid model where a unified data ingestion layer feeds both real‑time feature pipelines and batch‑oriented OLAP processing.

Log‑type streams feed real‑time dashboards and feature generation, while Binlog‑type streams are processed by a real‑time OLAP batch layer.

2. Real‑time Data Warehouse Layered Design

The architecture follows a three‑layer hierarchy:

Data Source Layer : unified ingestion of log‑type and business‑type data.

Real‑time Detail Layer : standardized cleaning, filtering, and enrichment to produce ready‑to‑use detail tables for downstream consumption.

Aggregation Layer : lightweight Flink/Storm operators compute metrics, forming a shared metric pool for consistent reporting.

06 Real‑time Platform Construction

1. Real‑time Base Layer Functions

Component‑based abstractions (cleaning, filtering, enrichment, transformation, encryption, etc.) are exposed via simple declarative interfaces. Custom logic can be added with Java or Python scripts, enabling flexible data conversion.

2. Real‑time Feature Production

Features are expressed in SQL, which the underlying engine (Flink/Storm) executes, shielding users from engine‑specific details. Metrics are managed as atomic or derived indicators with configurable windows and dimensions.

3. SLA Construction

Both end‑to‑end and job‑level SLAs are tracked via lightweight instrumentation points that report metrics to a centralized SLA monitoring platform, enabling visibility into latency and job efficiency.

4. Real‑time OLAP Solution

To avoid costly stream‑to‑state mapping, Meituan adopts Apache Doris as a high‑performance OLAP engine that supports fast incremental ingestion, unique and aggregate models, and both physical and logical views for downstream analytics.

07 Real‑time Application Case

A typical use case involves merchants offering discounts based on a user’s historical order count. The solution stores historical aggregates in a partitioned Doris table (offline partition) and real‑time metrics in a today‑partition, enabling a simple combined query that satisfies both historical and current‑day requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Flinkstream processingreal-time data warehouseLambda architectureStormKappa architecturedoris
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.