Big Data 11 min read

Design and Implementation of a Big Data OLAP Platform Based on Apache Kylin

This article explains the background, challenges, and architectural design of a big‑data OLAP platform that integrates Apache Kylin with a BI system, detailing pre‑computation strategies, cube construction, user authentication, storage engines, and query mechanisms to achieve sub‑second analytics on massive datasets.

Big Data Technology & Architecture

Jan 24, 2021

Design and Implementation of a Big Data OLAP Platform Based on Apache Kylin

With the rapid development of mobile internet, IoT, big data, and AI, data has become the most valuable asset behind these technologies, yet enterprises face severe challenges such as data silos, consistency issues, and slow reporting that hinder digital transformation.

Traditional Hadoop solved storage and batch processing, but fast analytical queries remain a challenge, leading to the emergence of "SQL on Hadoop" solutions like Hive, Impala, Presto, Phoenix, Drill, SparkSQL, and FlinkSQL, which use Massive Parallel Processing (MPP) and columnar storage to reduce query time from hours to minutes.

Even minute‑level responses are insufficient for interactive analysis; no open‑source OLAP engine can simultaneously satisfy data volume, performance, and flexibility, so a trade‑off is required.

Two observations guide the design: analytical queries usually need only aggregated results, and the set of meaningful dimension combinations is limited, allowing extensive pre‑computation of aggregates.

Apache Kylin, an open‑source distributed analytical data warehouse, provides SQL interfaces and multi‑dimensional OLAP capabilities on Hadoop/Spark/Flink, achieving sub‑second query latency through extensive pre‑computation that breaks the linear growth of query time with data size.

The BI platform integrates Kylin to offer unified user and permission management, a consistent UI, and intelligent routing among SparkSQL, FlinkSQL, and Presto, delivering a one‑stop big‑data OLAP solution.

The architecture includes a Cube Build Engine (supporting MapReduce, Spark, Flink), a REST Server exposing Kylin’s REST, JDBC, and ODBC APIs, a Query Engine that parses SQL, generates execution plans, and retrieves results from HBase, and HBase as the column‑oriented storage engine.

Kylin’s web module is built with Spring and secured by Spring Security, offering three authentication modes: custom testing, LDAP, and SAML single‑sign‑on, and integrates with the BI platform’s permission system.

Data modeling in the BI platform creates multidimensional models (facts, dimensions, measures) that are synchronized with Kylin, supporting incremental cube builds by specifying a time partition column and optionally snapshotting small dimension tables in memory.

Cube management enhancements include unified UI styling, centralized security, query management, and defaulting the build engine to Flink.

Monitoring is provided via logs and alerts, showing task progress, status (Disabled, ERROR, Ready), and controls for resume, discard, build, refresh, and merge operations.

Once a cube reaches the READY state, users can query it with standard SQL SELECT statements; queries must match the cube’s defined dimensions and measures.

Kylin offers flexible front‑end connectivity via REST API, JDBC, and ODBC, enabling diverse client integrations.

The storage layer leverages HBase’s scalability, and Kylin’s plugin architecture allows extensible integration with additional compute frameworks, data sources, and storage back‑ends.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Data Warehouse HBase OLAP Apache Kylin Precomputation

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.