Databases 21 min read

Comprehensive Comparison of Apache Kylin and Apache Doris: Architecture, Data Models, Storage, Query, and Operations

This article provides an in‑depth technical comparison of Apache Kylin and Apache Doris, covering their system architectures, aggregation and detail data models, storage engines, data import processes, query execution, deduplication, metadata handling, performance, high availability, maintainability, usability, schema‑change capabilities, features, and community ecosystems.

Big Data Technology & Architecture

Jul 29, 2019

Comprehensive Comparison of Apache Kylin and Apache Doris: Architecture, Data Models, Storage, Query, and Operations

1. System Architecture

Kylin follows a pre‑computation (cube) model that uses space‑for‑time to accelerate fixed‑pattern OLAP queries, with JobServer building cubes via MapReduce/Spark and QueryServer handling SQL parsing and HBase scans. Doris is an MPP OLAP system built on Google Mesa, Apache Impala, and ORCFile, consisting of a Frontend (FE) for query planning and a Backend (BE) for execution and storage.

2. Data Models

2.1 Kylin Aggregation Model

Kylin separates dimension and metric columns, aggregates metrics using functions such as SUM, COUNT, MIN, MAX, and distinct‑count, and stores Cuboid+dimension as HBase row keys with metrics as values.

2.2 Doris Aggregation Model

Doris adopts a similar model where dimensions are called Keys and metrics are Values; it introduces a special Replace function for point‑updates, though it cannot be pre‑aggregated.

2.3 Cuboid vs. RollUp

Kylin Cuboids and Doris RollUp tables are both materialized views or indexes that the system selects automatically during query execution.

2.4 Doris Detail Model

Doris also provides a non‑aggregated detail model that requires specifying sort columns; data is partitioned by date and bucketed, enabling efficient range scans.

3. Storage Engines

Kylin stores Cuboid data in HBase; each Segment maps to an HBase table, which is split into Regions and HFiles. Doris uses a columnar storage format inspired by ORC, with tablets as the smallest physical unit, secondary partitioning, and prefix indexes for fast key lookups.

4. Data Import

Kylin's import pipeline includes building a wide Hive table, dictionary construction, multi‑level Cuboid generation, HFile creation, loading into HBase, and metadata updates. Doris separates ETL (type/format validation, tablet splitting, sorting, aggregation) from LOADING (tablet data pull, format conversion, index generation) followed by metadata refresh.

5. Query Processing

Kylin executes a scatter‑gather model: SQL is parsed, optimized, compiled to code, and HBase scans are performed with optional coprocessor aggregation before final result merging. Doris uses an Impala‑based MPP engine: FE generates a single‑node plan, then distributes it into PlanFragments with ExchangeNodes to minimize data movement, and BE nodes execute Scan, Join, Aggregation, etc.

6. Precise Distinct Counting

Kylin implements pre‑computed distinct counting using global dictionaries and RoaringBitmap. Doris performs on‑the‑fly distinct counting in two phases, illustrated by the following SQL example:

SELECT a, COUNT(DISTINCT b, c), MIN(d), COUNT(*) FROM T GROUP BY a

7. Metadata Management

Kylin stores metadata as JSON rows in HBase, enabling horizontal scaling but requiring HBase even with a pluggable storage architecture. Doris keeps metadata in memory, offering fast access with limited scalability.

8. Performance

Kylin’s speed stems from pre‑computed cubes (scan + filter). Doris benefits from in‑memory metadata, pre‑aggregated roll‑up tables, MPP execution, vectorized processing, columnar storage, and prefix indexes.

9. High Availability

Kylin achieves HA for JobServer via ZooKeeper and for QueryServer via load balancers, but overall HA depends on the underlying Hadoop ecosystem. Doris provides HA for FE using a Paxos‑like protocol (BDB‑JE) and replicates tablets across BE nodes.

10. Maintainability

Kylin deployment requires a full Hadoop stack (HDFS, HBase, Hive, Spark, Yarn, ZooKeeper). Doris only needs FE and BE components. Operational complexity is higher for Kylin due to many dependent services.

11. Usability

Kylin offers HTTP, JDBC, and ODBC interfaces; Doris uses the MySQL protocol, allowing existing MySQL tools to connect directly. Learning Kylin involves understanding cuboids, dimensions, row‑key design, and Hadoop job logs, whereas Doris requires grasping aggregation vs. detail models, prefix indexes, and roll‑up tables.

12. Schema Change

Kylin requires full data re‑build for any cube schema change. Doris supports online schema changes with three modes: direct (full re‑load), sorted (re‑sort data), and linked (metadata‑only change, e.g., adding columns).

13. Features & Community

Both systems support roll‑up tables; Kylin can emulate detail queries by building a base cuboid with all columns. Doris’s community is nascent (mainly Baidu), while Kylin has a mature, China‑driven open‑source ecosystem.

14. Conclusion

The article objectively contrasts Kylin and Doris across architecture, data modeling, storage, ingestion, query, deduplication, metadata, performance, HA, maintainability, usability, schema evolution, and community, providing a foundation for selecting the appropriate OLAP solution based on specific requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Warehouse OLAP Database Comparison Apache Kylin Apache Doris

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.