Big Data 6 min read

Exploring OLAP Engine with Apache Kylin: Architecture, Theory, and Applications in Qunar's Big Data Platform

This article presents Qunar's experience transitioning from MySQL‑based OLAP to Apache Kylin, detailing the performance challenges, required features, Kylin's architecture and theory, cube construction process, storage mechanisms, real‑world applications, and the pitfalls and optimization practices discovered along the way.

Qunar Tech Salon

Aug 16, 2016

Exploring OLAP Engine with Apache Kylin: Architecture, Theory, and Applications in Qunar's Big Data Platform

Qunar's data team faced severe performance bottlenecks when using a Saiku+Mondrian engine backed by MySQL, as query response times grew to minutes with tens of millions of rows, prompting a need for a more scalable OLAP solution.

The new requirements emphasized visual drag‑and‑drop dimension composition, custom dimension queries, multi‑dimensional aggregation, sub‑second response, and support for petabyte‑scale data volumes.

Apache Kylin, an open‑source distributed analytics engine built on Hadoop, was introduced to meet these needs. Kylin provides ANSI‑SQL query interfaces, multi‑dimensional analysis, and sub‑second interactive queries over TB‑to‑PB datasets by pre‑computing cubes and storing results in HBase.

Kylin's architecture leverages Hive for source data extraction, MapReduce for cube building, and HBase for cube storage, exposing REST, JDBC, and ODBC APIs for downstream tools like Tableau and Excel.

The core theoretical principle of Kylin is "space for time": it pre‑computes all possible aggregations into cubes, turning complex joins and aggregations into fast look‑ups. Cubes are built layer‑by‑layer using a Layer Cubing algorithm, where each layer is a separate MapReduce job.

During query execution, Kylin rewrites SQL statements to target the appropriate cuboid, retrieving pre‑computed measures from HBase, which enables high concurrency and millisecond‑level latency.

In Qunar's airline ticket data scenario, Kylin powers a solution with six international ticket data cubes, 34 custom dimension tables, and a development cycle of 1–2 days per cube, delivering query latencies under 200 ms for tens of millions of rows.

The deployment also uncovered several pitfalls—such as dimension cardinality explosion, storage overhead, and cluster resource contention—and led to optimization practices including cube design guidelines, parameter tuning, and automated incremental data loading.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

HBase OLAP Hadoop Apache Kylin Cube

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.