Comprehensive Guide to Apache Kylin: Background, Architecture, Installation, Optimization, and Real‑World Use Cases
This article provides an in‑depth overview of Apache Kylin, covering its history, mission, core MOLAP principles, technical architecture, step‑by‑step installation (Docker and Hadoop), performance tuning, advanced cube settings, and detailed case studies from major companies such as Baidu, Lianjia, and Didi.
Background and Mission
Apache Kylin originated from eBay's BI‑on‑Hadoop project in 2013, open‑sourced in 2014, and became an Apache top‑level project in 2015. Its mission is to deliver ultra‑fast OLAP queries on massive datasets, enabling sub‑second, SQL‑like analytics.
Working Principle
Kylin implements a MOLAP cube model. Users define dimensions and measures, Kylin pre‑computes all possible cuboids (materialized views) and stores them, allowing queries to be answered by reading these pre‑aggregated results instead of scanning raw data.
Dimension and Measure Basics
Dimensions represent the angles of analysis (e.g., time, location). Measures are the numeric values to be aggregated (e.g., sales amount, transaction count).
Cube and Cuboid
For N dimensions there are 2ⁿ possible cuboids. Each cuboid stores aggregated results for a specific combination of dimensions. The full set of cuboids constitutes a cube.
select Time, Location, Sum(GMV) as GMV from Sales group by Time, LocationTechnical Architecture
Kylin consists of an online query layer and an offline build layer. Data sources (HDFS, Hive, Kafka, RDBMS) feed the build engine, which creates cubes stored primarily in HBase. The query layer exposes REST, JDBC, and ODBC interfaces that translate user SQL into cube‑based execution plans.
Core Concepts
Key concepts include data warehouses, OLAP vs. OLTP, BI, dimensional modeling (star and snowflake schemas), fact tables, dimension tables, and the relationship between dimensions and measures.
Quick Start
Docker‑Based Installation (No Hadoop Prerequisite)
Pull the official image and run a container with the required ports:
docker pull apachekylin/apache-kylin-standalone:3.1.0 docker run -d \
-m 8G \
-p 7070:7070 -p 8088:8088 -p 50070:50070 \
-p 8032:8032 -p 8042:8042 -p 16010:16010 \
apachekylin/apache-kylin-standalone:3.1.0After startup, access Kylin at http://127.0.0.1:7070/kylin/ and use the sample cube to explore functionality.
Hadoop‑Based Installation
Download the binary package, set environment variables (JAVA_HOME, HADOOP_HOME, etc.), run check-env.sh, then start Kylin with bin/kylin.sh start. Create projects, load Hive tables, define models, and build cubes via the web UI.
Optimization and Advanced Settings
Resource Tuning
Adjust MapReduce and HBase parameters (e.g., mapreduce.map.java.opts, HBase region size, coprocessor memory) to improve build speed and query latency.
Cube Advanced Settings
Use aggregation groups, joint dimensions, hierarchy dimensions, mandatory dimensions, and derived dimensions to prune unnecessary cuboids and control cube size. Example: setting a mandatory dimension halves the number of cuboids.
Real‑World Use Cases
Baidu Maps
Deployed ~80 cubes covering 50 billion rows, achieving sub‑second query latency for complex, multi‑dimensional analytics.
Lianjia
Operates a 6‑node Kylin cluster (3 build, 3 query) with 500+ cubes, 200 TB storage, and average query latency < 500 ms for 70 % of requests.
Didi
Maintains >700 cubes (30 TB) serving 2 000+ daily build jobs; 80 % of queries complete within 500 ms, supporting a wide range of business lines.
Conclusion
Apache Kylin provides a scalable, high‑performance OLAP solution for big‑data environments, offering flexible deployment options, extensive tuning knobs, and proven success in large‑scale production systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
