Integrating Kylin, Mondrian, and Saiku to Build an OLAP Analysis Tool
This article describes how the Youzan data team combined Apache Kylin, Mondrian, and Saiku into a three‑layer OLAP system, covering background, component overviews, technical architecture, schema integration challenges, count‑distinct handling, Kylin‑specific SQL quirks, and practical solutions.
The article introduces the Youzan data team's effort to create an OLAP analysis tool that satisfies multi‑dimensional metric viewing and analysis needs by integrating Apache Kylin, Mondrian, and Saiku. The integration involves custom modifications and environment configuration, and the system is still being optimized.
Background
Initially, the data team generated reports by writing SQL against Hive, storing results in MySQL, and having front‑end developers display them. As business grew, this long‑cycle process could not keep up with analysts' demands, prompting a search for a big‑data OLAP solution. After evaluating alternatives, they chose to combine Kylin, Mondrian, and Saiku.
The Three Giants
Kylin
Kylin is an Apache top‑level open‑source distributed analytics engine that provides a SQL interface and multi‑dimensional analysis on Hadoop, storing pre‑computed results in HBase to achieve millisecond‑level query response.
Mondrian
Mondrian is an open‑source OLAP engine that translates MDX (Multidimensional Expressions) queries into the target database's SQL based on a predefined schema.
Saiku
Saiku offers a browser‑based drag‑and‑drop UI for business users to explore data, converting user actions into MDX statements that are executed by Mondrian.
Technical Architecture
The three components form a simple three‑tier architecture. Saiku already bundles Mondrian’s JAR, so the main work is aligning Mondrian’s schema with Kylin’s schema and adding a Kylin‑specific dialect to Mondrian.
A public Git project provides a ready‑made integration; the authors thank the original author mustangore .
Some Details
Schema Alignment
Kylin’s cube schema defines the dimensions and measures it can pre‑compute, while Mondrian’s XML schema maps MDX to SQL. Inconsistencies between the two can cause Mondrian‑generated SQL that Kylin cannot execute. Kylin’s schema can be built via a graphical UI, but Mondrian’s schema must be hand‑written XML.
Do not use a view as a lookup table : Views are fine for fact tables but cause problems for lookup tables during cube building.
Kylin cannot apply conditions to count‑distinct measures : Workaround is to add a derived column in the fact view.
Prefer int over tinyint in Kylin : Different measure types may cause incompatibilities with Saiku.
Place Hive tables in the default database : Mondrian’s schema assumes the default database when locating physical tables.
Count Distinct Issue
Kylin’s cube can define count‑distinct measures, but its dialect originally disables allowsCountDistinct() , preventing Mondrian from generating proper count‑distinct SQL. The team patched the Kylin dialect to enable count‑distinct when appropriate:
public class KylinDialect extends JdbcDialectImpl {
public static final JdbcDialectFactory FACTORY =
new JdbcDialectFactory(KylinDialect.class, DatabaseProduct.KYLIN) {
protected boolean acceptsConnection(Connection connection) {
return super.acceptsConnection(connection);
}
};
@Override
public boolean allowsCountDistinct() {
return false;
}
@Override
public boolean allowsJoinOn() {
return true;
}
}By changing allowsCountDistinct() to return true and adding a guard in generateColumnCardinalitySql , the system can correctly handle count‑distinct measures.
private static String generateColumnCardinalitySql(
Dialect dialect,
String schema,
String table,
String column)
{
StringBuilder buf = new StringBuilder();
String exprString = dialect.quoteIdentifier(column);
if (dialect.allowsCountDistinct()) {
buf.append("select count(distinct ")
.append(exprString)
.append(") from ");
dialect.quoteIdentifier(buf, schema, table);
return buf.toString();
} else if (dialect.allowsFromQuery()) {
buf.append("select count(*) from (select distinct ")
.append(exprString)
.append(" from ");
dialect.quoteIdentifier(buf, schema, table);
buf.append(")");
...
}
}Kylin SQL Differences
Practical testing on Kylin 1.5 revealed three notable limitations:
No LIMIT beg, end , only LIMIT length .
Unsupported UNION / UNION ALL .
Cannot use WHERE EXISTS clauses.
Conclusion
The Kylin + Mondrian + Saiku stack can be made to work, though challenges arise when Mondrian generates SQL that falls outside a Kylin cube’s scope. Solutions include rebuilding the cube, adjusting the Mondrian schema, or customizing Mondrian’s Kylin dialect.
The team continues to develop custom features and performance optimizations for this three‑layer framework, hoping the article helps others and inviting feedback.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.