Big Data 11 min read

Chain Home's OLAP Platform and Kylin Usage

This article details Chain Home's OLAP platform architecture and Kylin usage, covering the evolution from early ROLAP to MOLAP multi-dimensional engine, Kylin's basic principles, platform structure, application scenarios, usage specifications, capability extensions, and middleware development.

Beike Product & Technology
Beike Product & Technology
Beike Product & Technology
Chain Home's OLAP Platform and Kylin Usage

This article details Chain Home's OLAP platform architecture and Kylin usage. With the expansion of business lines and data ecosystem construction, data scale has grown rapidly. The company's data analysis engine has evolved from early ROLAP architecture to MOLAP multi-dimensional engine.

Initially, using ROLAP engine with data source accessing HDFS, loading into Hive, ETL script development, and data warehouse layer generation. However, as data scale and demand increased, bottlenecks emerged requiring customized development for each demand.

Therefore, MOLAP engine was introduced, building a platform-level OLAP multi-dimensional analysis service. Apache Kylin was selected as the solution, supporting ANSI SQL queries, providing sub-second interaction with Hadoop data, and offering good scalability.

Kylin's basic principle involves pre-computation, calculating measures based on dimension combinations, saving results as Cubes. This converts complex aggregation operations into pre-computed result queries, achieving fast query and high concurrency capabilities through space-for-time trade-off.

Chain Home's OLAP platform was built in late 2016 with 6 machines deployed - 3 for distributed Cube building and 3 for load-balanced querying. The platform handles 500+ Cubes covering 12 business lines, with 200+TB total storage and trillion-level data rows.

Usage specifications include dimension optimization, rowkey design, dimension combination optimization, and timely cleanup of invalid data. The team has extended Kylin capabilities including distributed building support, optimized dictionary download strategy, global dictionary lock, forced dimension table association, and G1 garbage collector usage.

A Kylin middleware was developed for task scheduling, status monitoring, and permission management, providing enhanced functionality like priority scheduling, metadata management, and concurrency control.

data analysisdata warehouseOLAPApache KylinCubeKylinChain Home
Beike Product & Technology
Written by

Beike Product & Technology

As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.