Big Data 7 min read

Exploring OLAP Engine with Apache Kylin: Architecture, Theory, and Practical Applications in Flight Ticket Big Data

This article presents a comprehensive overview of the Qdata session on OLAP engine exploration, detailing the limitations of traditional MySQL‑based solutions, the requirements for large‑scale analytics, the architecture and theoretical foundations of Apache Kylin, its cube construction process, storage in HBase, query rewriting, real‑world flight‑ticket data applications, and the encountered challenges with corresponding optimization practices.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Exploring OLAP Engine with Apache Kylin: Architecture, Theory, and Practical Applications in Flight Ticket Big Data

Qdata is an internal learning platform for Qunar engineers, aiming to promote technical exchange and skill improvement across development teams.

In 2016, the technical carnival featured three tracks—Qmobile (mobile development), Qdata (data), and Qarch (architecture). This article focuses on the Qdata track, summarizing the keynote content.

Previously, the team used Saiku+Mondrian with MySQL for OLAP, but as flight‑ticket data grew to tens of millions of rows, query latency reached minutes, making the solution unsuitable for fast analytics.

Key requirements emerged: visual drag‑and‑drop dimension composition, custom query dimensions, multi‑dimensional aggregation, sub‑second response time, and support for petabyte‑scale data.

Apache Kylin Architecture and Core

Kylin is a next‑generation open‑source distributed analytics engine built on Hadoop, providing ANSI‑SQL and OLAP capabilities for massive datasets (TB to PB). It reads source data from Hive, uses MapReduce to build pre‑computed cubes, stores results in HBase, and exposes REST, JDBC, and ODBC interfaces.

Kylin became an Apache top‑level project in November 2015.

Because Kylin supports standard SQL, it integrates seamlessly with tools like Tableau and Excel.

Kylin Theory

The core principle is "space for time": pre‑computing aggregates and storing them in cubes (Cuboids) to enable fast query responses.

A cube consists of multiple cuboids representing different dimension combinations; queries retrieve the appropriate cuboid based on the SQL GROUP BY clauses.

Cube Construction (Layer Cubing)

Kylin builds cubes using a Layer Cubing algorithm: starting from the base cuboid (all dimensions), it iteratively aggregates to smaller cuboids, each step executed as a separate MapReduce job.

The MapReduce results are saved in HBase, where each row key is composed of dimension values and measures are stored in column families with encoding to reduce storage cost.

During query execution, Kylin rewrites SQL plans to replace joins, sums, and distinct counts with cube lookups, achieving fast, high‑concurrency responses.

Kylin Engine Applications

In Qunar's flight‑ticket big data scenario, the team built multiple data cubes (e.g., six international ticket cubes, 34 custom dimension tables), achieving daily incremental updates, sub‑200 ms query latency for millions of rows, and rapid development cycles (1‑2 days per cube).

Challenges and Optimizations

The presentation also covered common pitfalls and best‑practice optimizations, including cube design, system parameter tuning, and performance tuning techniques.

Big DataData WarehouseHBaseOLAPMapReduceApache KylinCube
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.