Big Data 17 min read

Real‑Time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

The talk by Tang Chuxi of Meituan explains typical real‑time data scenarios, the challenges faced when building a streaming data warehouse, and the design, development, operation, and performance‑optimisation solutions implemented on a Flink‑based platform to support massive, low‑latency business applications.

DataFunTalk
DataFunTalk
DataFunTalk
Real‑Time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

Speaker Tang Chuxi from Meituan Data Platform Center introduces Meituan's typical real‑time data applications, including KPI monitoring, real‑time feature feeds, event‑driven processing, and financial data reconciliation.

The platform was founded in 2014, initially using Storm and Spark Streaming, then migrated to Flink in 2017 and fully adopted Flink SQL as the primary programming interface in 2019, evolving toward a data‑centric development model.

Key challenges identified include high development and operation costs, difficulty reproducing local debugging cases in production, inconsistent data protocols across services, and lack of unified data‑warehouse construction standards leading to redundancy.

To address these, Meituan built a one‑stop real‑time data‑warehouse solution that provides standard ETL job templates, a web‑based IDE, extended SQL capabilities, and data‑quality tools, reducing the barrier for developers and ensuring reliable data delivery.

The architecture consists of a foundational services layer (storage, compute, scheduling, logging), middleware services (job templates, UDF hosting, metadata, monitoring, data‑quality management), and composable micro‑services that can be mixed and matched per business need.

Flink was chosen as the core compute engine because it provides sub‑second latency, exactly‑once semantics, and superior throughput compared with Storm, as demonstrated by internal benchmarks.

Development is unified across batch and streaming by exposing all data sources as tables (HiveTable, Kafka Topic, Redis) and using SQL as the primary language, with extensions for windowing, interval, and other streaming‑specific constructs.

A dedicated UDF hosting service centralises user‑defined functions, enabling code reuse, security checks, and easier maintenance.

Release pipelines enforce automated testing (TestCase) before job deployment, and a MiniCluster‑based asynchronous scheduler runs validation jobs, records results in a database, and reports metrics.

Performance optimisations at the operator level include multi‑level caching for high‑throughput joins, merging rapid state updates to reduce downstream messages, and three‑phase optimisation of join operators (pre‑process, compute, and emission stages).

Future plans focus on improving runtime scalability for ultra‑large jobs, enhancing availability for consumer‑facing services, and pursuing true stream‑batch convergence with incremental data‑warehouse production to achieve maximal throughput with minimal resources.

Overall, the platform demonstrates how a large‑scale e‑commerce company can build a robust, low‑latency real‑time data‑warehouse ecosystem that balances developer productivity, data quality, and operational efficiency.

big dataFlinkStream ProcessingData WarehouseReal-time DataMeituan
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.