Big Data 17 min read

Real‑time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

This article presents Meituan's real‑time data warehouse platform, describing typical streaming use cases, the evolution of its architecture from Storm and Spark Streaming to Flink, the challenges of development, operations and data quality, and the engineering solutions—including unified SQL, web IDE, UDF hosting, pipeline testing, and operator performance optimizations—implemented to support large‑scale, low‑latency analytics.

DataFunSummit
DataFunSummit
DataFunSummit
Real‑time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

Tom Chu‑Xi, a member of Meituan's Data Platform Center, introduces the real‑time data warehouse platform, focusing on its typical application scenarios, architectural design, construction challenges, and future development directions.

The platform supports a variety of real‑time use cases such as metric monitoring dashboards, low‑latency feature extraction for search, advertising CTR prediction and rider dispatch, event‑driven workflows like risk control and coupon distribution, and financial data reconciliation.

Meituan's real‑time data platform was founded in 2014 using Storm and Spark Streaming, adopted Flink in 2017, and since 2019 has exposed Flink SQL as the primary programming interface, shifting from a job‑centric to a data‑centric development model while pursuing incremental production, unified batch‑stream semantics, and a common data‑modeling approach.

Key pain points identified include high development and O&M costs due to frequent framework upgrades, difficulty reproducing online issues in local development, inconsistent data contracts across services, and the absence of unified data‑warehouse construction standards that lead to redundancy and resource waste.

The platform’s architecture is layered: a foundational services layer (storage, compute, scheduling, logging), middleware services, and a top‑level micro‑service collection (job‑template service, UDF hosting, metadata service, metric collection, data‑quality management). Flink was chosen over Storm because it provides sub‑second latency and exactly‑once semantics, and benchmark tests showed superior throughput.

To improve developer productivity, the platform offers standardized ETL templates, a web‑based IDE, and an extended SQL interface that abstracts underlying storage (Hive tables, Kafka topics, Redis) into unified tables. An adapter module handles custom message formats, while a UDF hosting service centralizes UDF compilation, security checks, and sharing. A release pipeline with automated test cases ensures data‑quality before deployment, and latency markers are used to monitor end‑to‑end delay.

Performance optimizations at the operator level include a multi‑level cache to reduce external KV store I/O, merging frequent state updates to cut downstream message volume, and a three‑stage join optimization (pre‑process duplicate events, batch state accesses during computation, and deduplicate emitted records).

Resulting capabilities include a Web IDE with debugging console, logical model management with lineage and metadata views, and an operations center that monitors job health, metrics, and logs. Future plans focus on incremental production for true stream‑batch integration, improving runtime scalability for massive jobs, and enhancing high‑availability for consumer‑facing services while striving for maximal performance with minimal resources.

The presentation concludes with thanks and a call for audience engagement.

big dataFlinkStream ProcessingData WarehousePlatform ArchitectureReal-time Data
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.