Meituan's OLAP Requirements and Apache Kylin Deployment: Architecture, Challenges, and Comparative Analysis
This article describes Meituan's massive OLAP workloads, the specific challenges of data scale, complex schemas, and precise counting, explains how Apache Kylin was integrated using wide tables and bitmap deduplication, compares its performance and features with Presto, Druid and other engines, and outlines future improvements.
1. Meituan's Data Scenario Characteristics
Meituan operates thousands of interactive OLAP queries on Hadoop datasets ranging from hundreds of millions to billions of rows, requiring high scalability, stability, precise results, and low latency for analysts and city business development teams.
The main challenges include extremely large fact tables (10⁸‑10⁹ rows) with high‑cardinality dimension tables, non‑star (snowflake or "constellation") schemas, frequently changing dimensions, and the need for data back‑tracking and precise distinct counts.
Typical dimensions number 5‑20, often hierarchical (e.g., organization levels) and include a date dimension. Metrics are usually under 50, many of which are expression‑based and cannot be directly aggregated by Kylin without preprocessing.
Query requirements demand both high stability and sub‑second response times for tens of thousands of daily queries.
2. Solution with Apache Kylin
The primary solution is to flatten non‑standard schemas into wide tables, handling dimension changes and high‑cardinality dimensions before loading into Kylin.
Expression metrics are pre‑computed into separate columns, often using Hive views or materialized tables.
Precise distinct counting is achieved with Bitmap indexes (currently supporting int types, with full‑type support added in Kylin 1.5.3).
Kylin servers are deployed in a split architecture: lightweight Kylin Server instances act as clients, while the heavy lifting (cube building, HBase reads) runs on dedicated machines. Production (kylin01) and pre‑release (kylin02) servers separate stable serving from development and review.
This separation improves stability, allows thorough cube reviews before promotion, and simplifies permission control.
3. Comparative Analysis of Mainstream OLAP Systems
Benchmarks using the SSB dataset (scale from 10M to 1B rows) compare Presto, Kylin 1.3, Kylin 1.5, and Druid across five typical query patterns.
At the 10M scale, Druid and Kylin 1.5 outperform Presto and Kylin 1.3; at the 100M‑1B scale, Kylin 1.5 shows linear scaling thanks to parallel HBase scans, while Kylin 1.3 degrades due to serial scans.
Beyond performance, the evaluation considers functionality completeness, ease of use, data preparation cost, and query flexibility. Presto excels in flexibility and low data‑prep cost but lags in performance; Druid offers fast aggregation but lacks precise distinct counting; Kylin balances stable performance, precise results, and reasonable operational overhead.
4. Kylin's Advantages
• Very stable performance (99.99% availability, 95% queries < 1 s).
• Only engine providing exact distinct counts for Meituan’s use case.
• Low deployment and operational cost thanks to existing Hadoop ecosystem (Hive, HBase) and a web UI for cube management.
• Active open‑source community led largely by Chinese contributors, with strong support from the Kyligence commercial team.
5. Future Work
• Extend bitmap distinct counting to all data types (already in 1.5.3).
• Improve resource isolation for multi‑tenant clusters and optimize large result‑set pagination.
• Continue enhancing build efficiency, queue management, and streaming capabilities introduced after Kylin 1.5.
6. Q&A Highlights
Build of ~200 M rows with ~15 dimensions completes in under two hours, producing a few hundred gigabytes of cube data. Kerberos tickets need periodic refresh without service downtime. The SQL interface is emphasized for business analysts who prefer standard SQL over custom APIs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
