Big Data 17 min read

How Apache Kylin Supercharges Big Data Analytics with Pre‑Computed Cubes

Apache Kylin is an open‑source, distributed OLAP engine built on Hadoop that uses pre‑computed cubes to deliver sub‑second, high‑concurrency SQL queries on massive datasets, integrates with popular BI tools, offers a modular architecture, recent 1.5.x enhancements, and extensive deployment options.

21CTO
21CTO
21CTO
How Apache Kylin Supercharges Big Data Analytics with Pre‑Computed Cubes
21CTO community introduction: This article is based on the work of Li Dong, a Kyligence engineer and Apache Kylin committer.

What is Apache Kylin

In the era of big data, many enterprises use Hadoop for data management, but traditional BI tools struggle with scalability and interactive queries. Apache Kylin, originally contributed by eBay, is an open‑source distributed analytics engine that provides a SQL interface on Hadoop and OLAP capabilities for TB‑ to PB‑scale data, delivering sub‑second query latency and high concurrency.

Kylin became an Apache top‑level project in 2015 and later spun off the commercial company Kyligence.

Kylin's basic principles and architecture

The core idea of Kylin is pre‑computation: metrics needed for multidimensional analysis are calculated in advance and stored as cubes. Queries are rewritten to read these pre‑computed results, enabling fast response and high concurrency.

Kylin reads source data from Hive, builds cubes using MapReduce, stores results in HBase, and exposes REST, JDBC, and ODBC query interfaces. It supports standard ANSI SQL, allowing seamless integration with tools like Tableau and Excel.

The cube consists of many cuboids, each representing a different combination of dimensions. During a query, Kylin selects the appropriate cuboid and returns the pre‑aggregated measures.

Kylin’s architecture includes a metadata manager, job engine, storage engine (HBase), REST server, and query engine built on Apache Calcite. It also provides ODBC/JDBC drivers and a web UI for cube management.

Kylin's latest features

Version 1.5.x introduces a pluggable architecture that decouples data source, cube engine, and storage engine, allowing integration with systems such as Kafka, Spark, and Cassandra. It also adds a fast cubing algorithm, shard‑based HBase storage, and performance improvements that can double average query speed.

The upcoming 1.5.2 release (currently under Apache voting) includes 36 bug fixes, 33 improvements, and 6 new features such as enhanced HyperLogLog calculations, faster cube build steps, UI hints, and support for Hive view lookups. It also adds support for MapR and CDH Hadoop distributions.

Kylin provides a Diagnosis tool that packages project metadata, logs, and HBase configuration into a zip file to help users and developers troubleshoot cube build or query issues.

Q&A

Typical questions cover MDX support (currently not supported), cube build time for terabyte‑scale data, contribution workflow, integration with Elasticsearch, front‑end drag‑and‑drop tools, differences between community and commercial editions, concurrency capabilities, plug‑in support for Spark ML and Spark SQL, and benchmark results showing 90% of queries completing within 5 seconds on a 280 billion‑row cube.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLOLAPHadoopApache KylinPrecomputed Cubes
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.