Apache Kylin: From Extreme OLAP Engine to an Analytical Data Warehouse for Big Data
The article chronicles Apache Kylin's evolution from an Apache incubator OLAP engine to a comprehensive analytical data warehouse, highlighting its five‑year growth, extensive enterprise adoption, core data‑warehouse features, and the community’s rebranding to better reflect its big‑data capabilities.
Apache Kylin was open‑sourced in October 2014, entered the Apache Software Foundation incubator, and a year later graduated to become a top‑level Apache project with the original slogan "Extreme OLAP Engine for Big Data".
Over the past five years Kylin has become an indispensable component of the big‑data ecosystem, helping thousands of enterprises perform efficient analytics.
Key use cases include eBay migrating workloads from expensive proprietary data warehouses like Teradata to Kylin, now serving millions of queries daily with sub‑second latency; companies such as Meituan, Ctrip, JD, Didi, Xiaomi, Huawei, and others building Data‑as‑a‑Service platforms on Kylin; Microsoft SSAS users transitioning to Kylin for larger data volumes; China UnionPay and a leading insurance group replacing IBM Cognos with Hadoop + Kylin, where a single Kylin cube can replace hundreds of Cognos cubes; and Construction Bank and Agricultural Bank deploying Kylin + Hadoop for next‑generation analytics platforms.
These examples show that the community now treats Kylin not merely as a single‑purpose engine but as a replacement for traditional analytical data warehouses.
A widely accepted definition of a data warehouse is: "A data warehouse is a subject‑oriented, integrated, time‑variant, and non‑volatile collection of data in support of management's decision‑making process." Translated, this means a data warehouse is a subject‑focused, integrated, time‑aware, and immutable data set that supports critical managerial decisions.
Kylin aligns with these characteristics: it lets users create one or more OLAP cubes per analytical subject; each cube is subject‑oriented. It integrates seamlessly with Hadoop, Hive, Spark, Kafka, and other big‑data systems. Kylin loads data by time partitions, builds cubes, and stores them as segments, generating snapshots for dimension tables that remain stable during analysis. All hierarchical aggregations are strictly consistent, and Kylin offers SQL, JDBC/ODBC, and HTTP APIs for easy connection to BI tools like Tableau.
Today Kylin provides a rich set of capabilities beyond acceleration, including a friendly web UI, wizard‑style designer, automated task generation and data loading, high‑performance query and storage engines, comprehensive APIs, and robust user‑permission and security controls. Combined with Hadoop's distributed storage and compute framework, Kylin forms a complete analytical data‑warehouse solution that scales from gigabytes to petabytes or even exabytes.
In March 2020 the Kylin community updated its slogan to "Analytical Data Warehouse for Big Data" to more accurately describe its capabilities and improve discoverability.
The article concludes with gratitude to contributors and an optimistic outlook for the next five years of innovation.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.