How Meituan Built a Scalable Real‑Time Data Warehouse with Flink
This article explains how Meituan tackled growing real‑time data demands by redesigning its streaming platform, adopting a layered real‑time data warehouse architecture, selecting storage and compute technologies such as Cellar, Elasticsearch, Druid and Flink, and sharing practical tips on dimension expansion, joins, and aggregation to achieve higher throughput and lower latency.
Introduction
Real‑time data services have become a core requirement for large‑scale enterprises. This article describes the design and implementation of a real‑time data warehouse built on Apache Flink, focusing on architecture, storage choices, compute engine selection, and practical engineering lessons.
Early Real‑Time Platform Architecture
Initially a “one‑pipeline‑to‑the‑end” model was used: Storm jobs consumed real‑time queues, extracted metrics and pushed them directly to applications. As business grew, three problems emerged:
Rapid increase of metrics caused severe code coupling (“chimney” development).
Diverse requirements (detail data, OLAP) could not be satisfied by a single development model.
Absence of a comprehensive monitoring system delayed issue detection.
Layered Real‑Time Data Warehouse Design
A four‑layer architecture was adopted to address the above challenges:
ODS layer : Ingests binlog, traffic logs and business real‑time queues.
Detail layer : Integrates domain facts, builds real‑time dimension tables from offline full loads and incremental streams.
Summary layer : Uses wide‑table models to enrich detail data with dimensions and to aggregate common metrics.
App layer : Provides RPC‑based services tailored to specific downstream needs.
This separation enables each layer to handle filtering, cleaning, standardisation and aggregation independently, improving code reuse and production efficiency.
Storage Engine Selection
Unlike offline warehouses that store all layers in Hive or relational databases, the real‑time warehouse mixes message queues, high‑speed KV stores and specialised systems:
Intermediate tables are stored using a hybrid of message queues and KV stores for fast consumption.
High‑throughput dimension data (>100 k TPS) is stored in Cellar (Meituan’s internal KV store based on Tair).
Summary data also uses Cellar for low‑latency lookups.
Application‑layer storage decisions:
High‑frequency simple queries (≈1 k QPS) → Cellar.
Complex queries or detailed listings → Elasticsearch.
Low‑frequency OLAP → Druid (real‑time indexing).
Versioned historical data → MySQL for easy iteration.
Compute Engine Selection
Storm was initially used, but its low‑level API required extensive boiler‑plate for joins, aggregations and state handling, leading to high development cost and sub‑optimal performance. After evaluating Storm, Spark Streaming and Flink, Flink was chosen because:
Rich high‑level API (Table API, native SQL) simplifies structured data processing.
Fault‑tolerance and state management meet production requirements.
Latency comparable to Storm while throughput is roughly ten times higher in internal tests.
TableSchema integration eases metadata management.
Practical Experience with Flink
1. Dimension Expansion
Dimension data is fetched via an asynchronous service built on Cellar, achieving sub‑1 ms latency. Caching and key‑based partitioning reduce external calls for high‑volume streams (e.g., traffic logs at 100 k events/s).
2. Data Joins
Flink Table joins operate on windowed streams; at least one equality condition is required for grouping. Window size directly impacts memory usage and checkpoint size, so windows are kept small, RocksDB is used as the state backend, and incremental checkpoints are enabled for long‑duration joins.
3. Aggregations
Standard aggregations (SUM, MIN, MAX, AVG) are natively supported. For distinct counts, custom UDAFs were implemented using MapView, BloomFilter and HyperLogLog to balance accuracy and memory consumption. When large keys cause serialization overhead in RocksDBStateBackend, switching to FsStateBackend mitigates the bottleneck. Ranking‑type analyses that require full‑window data are replaced with Top‑N logic to limit memory usage.
Results
Replacing the legacy pipeline with the layered real‑time warehouse unified data sources, ensured metric and dimension consistency, and dramatically reduced code size (average Java job ~300 lines → dozens of SQL statements). Development time shrank, allowing a single developer to deliver multiple real‑time metrics per day.
Resource tuning per layer further improved performance: the ODS layer runs with 1 GB memory per CPU, while the summary layer receives additional memory for heavy aggregations. Overall latency remained stable despite a longer processing chain, and overall resource consumption decreased.
Future Outlook
The goal is to achieve offline‑warehouse level accuracy and consistency, while enhancing data reliability, monitoring, lineage detection and cross‑checks. Ongoing work focuses on reducing the learning curve for real‑time development and enabling more teams to build their own streaming solutions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
