Apache Kylin Overview and Model Optimization Practices for Trajectory Analytics
This article introduces Apache Kylin, details its deployment at Tongcheng Yilong, explains the design of a large‑scale trajectory model, and provides step‑by‑step optimization techniques—including cube dimension reduction, HBase rowkey tuning, build parameter tweaks, high‑cardinality handling, and query compression disabling—to achieve sub‑second OLAP queries on multi‑terabyte data.
Apache Kylin graduated from the Apache Incubator on December 8, 2015, becoming the first top‑level project contributed entirely by a Chinese team. It offers high‑concurrency, low‑latency OLAP queries on big data, supporting multi‑dimensional aggregation, precise distinct metrics, streaming and offline cube calculations, and standard SQL access via ODBC, JDBC, or RESTful APIs.
In November 2016, the data center at Tongcheng Yilong adopted Kylin as a primary OLAP engine, upgrading to version 2.0.0 in 2017 and 2.6.4 in 2019. The deployment supports nearly 100 cubes covering ~7 TB of data, using two query nodes, two job nodes, and a dedicated HBase cluster with LDAP‑based permission management.
The trajectory model uses a snowflake schema with one fact table and ten dimension tables (25 dimensions total), handling billions of page‑views (PV) and tens of millions of unique visitors (UV) daily. Performance tests show strong concurrent query capabilities across various dimension combinations.
Model Design Optimization : Prefer incremental models over full‑load models when possible, using date partition columns (formats: yyyy‑MM‑dd, yyyyMMdd, yyyy‑MM‑dd HH:mm:ss). Declare high‑cardinality distinct metrics (e.g., UV) as global dictionaries.
Cube Dimension Reduction : Reduce the number of cuboids by applying derived dimensions, extended column dimensions, necessary dimensions, hierarchical dimensions, joint dimensions, and aggregation groups. For example, declaring a dimension as derived removes it from cuboid calculations, cutting the cuboid count from 2ⁿ to 2.
HBase RowKey Optimization : Encode string columns to numeric types, choose appropriate rowkey encoding types (Boolean, Date, Dict, Integer, Time, Fixed_length, Fixed_length_hex), and order mandatory and high‑cardinality dimensions at the front of the rowkey to minimize scan ranges. Example rowkey order A‑B‑C yields the smallest scan range, while B‑A‑C or B‑C‑A can cause extensive unnecessary scans.
Build Parameter Optimization includes high‑cardinality dimension handling (global dictionary or shard‑by settings) and tuning MapReduce/Spark resources. Sample configuration lines:
kylin.engine.mr.config-override.mapreduce.reduce.memory.mb=7168 kylin.engine.mr.config-override.mapreduce.reduce.java.opts=-Xmx6g kylin.engine.mr.config-override.mapreduce.map.memory.mb=7168 kylin.engine.mr.config-override.mapreduce.map.java.opts=-Xmx6g kylin.engine.mr.build-uhc-dict-in-additional-step=true kylin.engine.mr.uhc-reducer-count=5Hive, MapReduce, and Spark job prefixes are also set accordingly to improve performance.
High‑Cardinality Dimension Optimization recommends using global dictionaries for distinct metrics and shard‑by rowkey encoding for dimensions, reducing memory pressure and avoiding OOM during dictionary building.
Disabling Query Compression can dramatically cut query latency. By setting kylin.storage.hbase.endpoint-compress-result=false , the compression/decompression step (which previously added ~10 seconds) is skipped, reducing query time from ~15 seconds to a few hundred milliseconds.
Future Plans : To simplify Kylin modeling, an automated modeling interface is under development. It will support two scenarios: (1) business provides only fact tables, and the system infers dimensions, metrics, and optimal dimension combinations from historical queries across Presto, Hive, SparkSQL, Greenplum, etc.; (2) business supplies existing SQL queries, and the system parses them to generate fact and dimension tables, set appropriate rowkeys, and optimize dimension groups, enabling rapid model construction with minimal manual effort.
Tongcheng Travel Technology Center
Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.