Big Data 12 min read

Integrating Kylin, Mondrian, and Saiku to Build an OLAP Analysis Tool

This article describes how the Youzan data team combined Apache Kylin, Mondrian, and Saiku into a three‑layer OLAP system, covering background, component overviews, technical architecture, schema integration challenges, count‑distinct handling, Kylin‑specific SQL quirks, and practical solutions.

Architect
Architect
Architect
Integrating Kylin, Mondrian, and Saiku to Build an OLAP Analysis Tool

The article introduces the Youzan data team's effort to create an OLAP analysis tool that satisfies multi‑dimensional metric viewing and analysis needs by integrating Apache Kylin, Mondrian, and Saiku. The integration involves custom modifications and environment configuration, and the system is still being optimized.

Background

Initially, the data team generated reports by writing SQL against Hive, storing results in MySQL, and having front‑end developers display them. As business grew, this long‑cycle process could not keep up with analysts' demands, prompting a search for a big‑data OLAP solution. After evaluating alternatives, they chose to combine Kylin, Mondrian, and Saiku.

The Three Giants

Kylin

Kylin is an Apache top‑level open‑source distributed analytics engine that provides a SQL interface and multi‑dimensional analysis on Hadoop, storing pre‑computed results in HBase to achieve millisecond‑level query response.

Mondrian

Mondrian is an open‑source OLAP engine that translates MDX (Multidimensional Expressions) queries into the target database's SQL based on a predefined schema.

Saiku

Saiku offers a browser‑based drag‑and‑drop UI for business users to explore data, converting user actions into MDX statements that are executed by Mondrian.

Technical Architecture

The three components form a simple three‑tier architecture. Saiku already bundles Mondrian’s JAR, so the main work is aligning Mondrian’s schema with Kylin’s schema and adding a Kylin‑specific dialect to Mondrian.

A public Git project provides a ready‑made integration; the authors thank the original author mustangore .

Some Details

Schema Alignment

Kylin’s cube schema defines the dimensions and measures it can pre‑compute, while Mondrian’s XML schema maps MDX to SQL. Inconsistencies between the two can cause Mondrian‑generated SQL that Kylin cannot execute. Kylin’s schema can be built via a graphical UI, but Mondrian’s schema must be hand‑written XML.

Do not use a view as a lookup table : Views are fine for fact tables but cause problems for lookup tables during cube building.

Kylin cannot apply conditions to count‑distinct measures : Workaround is to add a derived column in the fact view.

Prefer int over tinyint in Kylin : Different measure types may cause incompatibilities with Saiku.

Place Hive tables in the default database : Mondrian’s schema assumes the default database when locating physical tables.

Count Distinct Issue

Kylin’s cube can define count‑distinct measures, but its dialect originally disables allowsCountDistinct() , preventing Mondrian from generating proper count‑distinct SQL. The team patched the Kylin dialect to enable count‑distinct when appropriate:

public class KylinDialect extends JdbcDialectImpl {
    public static final JdbcDialectFactory FACTORY =
            new JdbcDialectFactory(KylinDialect.class, DatabaseProduct.KYLIN) {
                protected boolean acceptsConnection(Connection connection) {
                    return super.acceptsConnection(connection);
                }
            };
    @Override
    public boolean allowsCountDistinct() {
        return false;
    }
    @Override
    public boolean allowsJoinOn() {
        return true;
    }
}

By changing allowsCountDistinct() to return true and adding a guard in generateColumnCardinalitySql , the system can correctly handle count‑distinct measures.

private static String generateColumnCardinalitySql(
        Dialect dialect,
        String schema,
        String table,
        String column)
{
    StringBuilder buf = new StringBuilder();
    String exprString = dialect.quoteIdentifier(column);
    if (dialect.allowsCountDistinct()) {
        buf.append("select count(distinct ")
           .append(exprString)
           .append(") from ");
        dialect.quoteIdentifier(buf, schema, table);
        return buf.toString();
    } else if (dialect.allowsFromQuery()) {
        buf.append("select count(*) from (select distinct ")
           .append(exprString)
           .append(" from ");
        dialect.quoteIdentifier(buf, schema, table);
        buf.append(")");
        ...
    }
}

Kylin SQL Differences

Practical testing on Kylin 1.5 revealed three notable limitations:

No LIMIT beg, end , only LIMIT length .

Unsupported UNION / UNION ALL .

Cannot use WHERE EXISTS clauses.

Conclusion

The Kylin + Mondrian + Saiku stack can be made to work, though challenges arise when Mondrian generates SQL that falls outside a Kylin cube’s scope. Solutions include rebuilding the cube, adjusting the Mondrian schema, or customizing Mondrian’s Kylin dialect.

The team continues to develop custom features and performance optimizations for this three‑layer framework, hoping the article helps others and inviting feedback.

big dataHiveHBaseOLAPKylinMondrianSaiku
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.