Big Data 7 min read

Apache Kylin: From Extreme OLAP Engine to an Analytical Data Warehouse for Big Data

The article chronicles Apache Kylin's evolution from an Apache incubator OLAP engine to a comprehensive analytical data warehouse, highlighting its five‑year growth, extensive enterprise adoption, core data‑warehouse features, and the community’s rebranding to better reflect its big‑data capabilities.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Apache Kylin: From Extreme OLAP Engine to an Analytical Data Warehouse for Big Data

Apache Kylin was open‑sourced in October 2014, entered the Apache Software Foundation incubator, and a year later graduated to become a top‑level Apache project with the original slogan "Extreme OLAP Engine for Big Data".

Over the past five years Kylin has become an indispensable component of the big‑data ecosystem, helping thousands of enterprises perform efficient analytics.

Key use cases include eBay migrating workloads from expensive proprietary data warehouses like Teradata to Kylin, now serving millions of queries daily with sub‑second latency; companies such as Meituan, Ctrip, JD, Didi, Xiaomi, Huawei, and others building Data‑as‑a‑Service platforms on Kylin; Microsoft SSAS users transitioning to Kylin for larger data volumes; China UnionPay and a leading insurance group replacing IBM Cognos with Hadoop + Kylin, where a single Kylin cube can replace hundreds of Cognos cubes; and Construction Bank and Agricultural Bank deploying Kylin + Hadoop for next‑generation analytics platforms.

These examples show that the community now treats Kylin not merely as a single‑purpose engine but as a replacement for traditional analytical data warehouses.

A widely accepted definition of a data warehouse is: "A data warehouse is a subject‑oriented, integrated, time‑variant, and non‑volatile collection of data in support of management's decision‑making process." Translated, this means a data warehouse is a subject‑focused, integrated, time‑aware, and immutable data set that supports critical managerial decisions.

Kylin aligns with these characteristics: it lets users create one or more OLAP cubes per analytical subject; each cube is subject‑oriented. It integrates seamlessly with Hadoop, Hive, Spark, Kafka, and other big‑data systems. Kylin loads data by time partitions, builds cubes, and stores them as segments, generating snapshots for dimension tables that remain stable during analysis. All hierarchical aggregations are strictly consistent, and Kylin offers SQL, JDBC/ODBC, and HTTP APIs for easy connection to BI tools like Tableau.

Today Kylin provides a rich set of capabilities beyond acceleration, including a friendly web UI, wizard‑style designer, automated task generation and data loading, high‑performance query and storage engines, comprehensive APIs, and robust user‑permission and security controls. Combined with Hadoop's distributed storage and compute framework, Kylin forms a complete analytical data‑warehouse solution that scales from gigabytes to petabytes or even exabytes.

In March 2020 the Kylin community updated its slogan to "Analytical Data Warehouse for Big Data" to more accurately describe its capabilities and improve discoverability.

The article concludes with gratitude to contributors and an optimistic outlook for the next five years of innovation.

analyticsBig DataData WarehouseOLAPHadoopApache Kylin
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.