Introduction to Apache Kylin: A Fast Big Data OLAP Engine
Apache Kylin is an open‑source, Hadoop‑based OLAP engine that provides sub‑second, multi‑dimensional SQL queries on massive datasets, with features such as cube pre‑computation, real‑time analytics, and seamless BI tool integration, and its latest v2.6.4 release adds numerous fixes and improvements.
Apache Kylin v2.6.4 has just been released, bringing many bug fixes and improvements; the project has evolved rapidly from version 1.5.3 three years ago to become an indispensable OLAP engine in the Hadoop ecosystem.
Apache Kylin, originally open‑sourced by eBay, provides a SQL query interface and multi‑dimensional analysis on top of Hadoop/Spark, delivering sub‑second query latency for massive datasets through pre‑computation and cube building.
The underlying data is stored in HBase, while data ingestion and cube building can be performed via Hive, Kafka, or JDBC sources (available since v2.3.0).
Key features and characteristics include:
Ultra‑fast OLAP engine that reduces query latency on hundred‑billion‑row datasets.
ANSI‑SQL query support with a comprehensive SQL interface.
Interactive query capability with sub‑second response times.
Multi‑dimensional cubes: Kylin defines data models and builds cubes for datasets exceeding a hundred billion rows.
Real‑time OLAP: data can be processed as it arrives, enabling multi‑dimensional analysis with second‑level latency.
Seamless integration with BI tools such as Tableau and PowerBI.
For further learning, the official documentation (including installation, cube building tutorials, and tool integration) is recommended:
http://kylin.apache.org/docs/
A Chinese version of the site is also available: http://kylin.apache.org/cn/docs/
The source code is hosted on GitHub: https://github.com/apache/kylin
Mailing lists for developers and users are [email protected] and [email protected]; subscriptions can be made by emailing [email protected] or [email protected].
Additional recommended reading includes articles on Elasticsearch performance monitoring, HBase internals, and monitoring platforms based on Telegraf, InfluxDB, and Grafana.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.