Big Data 12 min read

How Apache Kylin Enables Sub‑Second OLAP on Massive Data Sets

Apache Kylin leverages pre‑computed OLAP cubes on Hadoop/Spark/Flink to deliver sub‑second query responses for massive datasets, detailing its architecture, integration with BI platforms, user security, cube building, monitoring, and storage using HBase, illustrating how it overcomes big‑data analytical challenges.

IT Architects Alliance

May 19, 2022

How Apache Kylin Enables Sub‑Second OLAP on Massive Data Sets

Research Background

With the rapid growth of mobile Internet, IoT, big data, and AI, data has become the most valuable asset and the foundation for business decisions. Enterprises face data silos, inconsistent data, scattered data assets, slow report queries, and rising costs as data volumes explode, making fast, valuable insight extraction a critical challenge.

Pre‑Computation Concept

Statistical results are the primary goal of big‑data queries, while raw records are rarely needed. By pre‑aggregating results during data ingestion, systems can answer queries using these pre‑computed values, sacrificing some flexibility for dramatic performance gains and achieving near‑second response times on massive datasets.

Apache Kylin Overview

Apache Kylin is an open‑source, distributed analytical data warehouse that provides SQL query interfaces and multi‑dimensional OLAP capabilities on top of Hadoop, Spark, or Flink. Through extensive pre‑computation, Kylin breaks the linear relationship between query time and data size, enabling sub‑second queries on billion‑row tables.

BI Platform Integration Goals

The BI platform integrates Kylin to provide unified user and permission management, a consistent UI, and extended features that adapt Kylin to the platform’s needs. It combines SparkSQL, FlinkSQL, Presto, and other engines via intelligent routing, delivering a one‑stop big‑data OLAP solution.

System Architecture

The architecture consists of four main components:

Cube Build Engine : Supports MapReduce, Spark, Flink for building data cubes.

Rest Server : Exposes REST, JDBC, and ODBC interfaces for query submission.

Query Engine : Parses SQL, generates execution plans, forwards queries to HBase, and returns results.

Storage Engine : Uses the distributed column‑oriented database HBase as the underlying store.

User and Permission Management

Kylin’s web module is built with the Spring framework and secures access via Spring Security. It supports three authentication modes—custom testing, LDAP, and SAML—providing flexible identity verification for enterprise environments.

Data Model and Cube Construction

BI data subjects are modeled from source metadata, allowing drag‑and‑drop visual modeling. Each cube links to a data model and supports incremental builds by specifying a partition column, avoiding re‑processing of historical data. Dimension tables smaller than 300 MB can be cached as in‑memory snapshots to improve efficiency.

Cube Configuration and Feature Enhancements

Unified page layout and Chinese language support.

Centralized security and permission control.

Enhanced cube management and query interfaces.

Default build engine switched to Flink for faster processing.

Cube Monitoring

Kylin provides task logs, alerts, progress bars, and detailed step‑by‑step status. Operators can view overall cube counts, storage usage, and individual task states such as Disabled, ERROR, or Ready. Control actions include Resume, Discard, Build, Refresh, and Merge.

Query Execution

Once a cube reaches the READY state, users can query it with standard SQL SELECT statements. Queries must match the cube’s defined dimensions and measures; otherwise, Kylin cannot use the pre‑computed data.

Storage Engine

Kylin’s plugin architecture enables seamless integration with HBase, providing strong scalability for petabyte‑scale datasets. Since version 1, Kylin tightly couples with Hadoop MapReduce, Hive as the data source, and HBase as the storage layer.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data SQL Data Warehouse HBase OLAP Apache Kylin Precomputation

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.