Comparative Analysis of Kylin, Druid, and ClickHouse OLAP Engines
This article examines the core data models and indexing mechanisms of three leading OLAP engines—Kylin, Druid, and ClickHouse—comparing their architectures, query performance characteristics, and suitability for aggregation versus detailed query workloads in modern big data environments.
1. Kylin Data Model
Kylin transforms a two‑dimensional Hive table into a Cube and then stores the Cube in HBase, effectively performing two conversions.
The first conversion creates a traditional database cube where each CuboidId represents a fixed set of columns (e.g., dimensions A and B). The corresponding SQL is:
select A, B, sum(M), sum(N) from table group by A, BThe second conversion serializes the CuboidId and dimension information into the HBase rowkey and stores metric columns as column families, performing pre‑aggregation during the write.
2. Kylin Index Structure
Kylin relies on HBase's indexing, which is a simplified B+‑tree (HFile). The index is a clustered index sorted by rowkey, typically two or three levels deep, with default node size of 64 KB. Lookup uses a tree traversal and binary search within nodes.
Kylin excels at aggregation queries because data is pre‑aggregated; query speed depends heavily on whether the query can hit the appropriate CuboidId, leading to potential performance variance.
3. Druid Data Model
Druid also pre‑aggregates data but does so by grouping on all dimensions rather than building explicit cubes. This results in a simpler model where each dimension is stored as a column.
4. Druid Index Structure
Druid uses a column‑oriented storage format with a bitmap‑based index for each dimension. Each unique dimension value gets a bitmap whose length equals the total number of rows; a set bit indicates the row contains that value. This provides a reverse index, while the dictionary offers a forward index.
Example query:
select site, sum(pv) from xx where date=2020-01-01 and city='bj' group by siteThe execution steps are: locate the bitmap for "bj" in the city dictionary, intersect it with other dimension bitmaps, retrieve matching rows from the metric column, and perform aggregation.
Druid is optimized for aggregation queries with moderate dimensionality; it avoids the large performance swings seen in Kylin.
5. ClickHouse Index Structure (MergeTree Engine)
ClickHouse stores data in ordinary two‑dimensional tables but uses a column‑oriented index. Selected columns become the primary sorting key (similar to a MySQL composite index). Every 8 194 rows a "mark" is recorded, storing the offset of each column's data block.
Query example:
select count(distinct action) where date=toDate(2020-01-01) and city='bj'Execution steps: binary search the primary index to find relevant marks, intersect row sets for the date and city columns, translate row numbers to marks and offsets, then read the action column values for aggregation.
ClickHouse performs best when the query predicates match the leftmost prefix of the sorting key; otherwise it may need to scan large portions of the table, causing significant performance variance.
Overall Summary
Kylin and Druid are suited for pure aggregation workloads; ClickHouse supports both aggregation and detailed query scenarios.
Query efficiency ranking for aggregation: Kylin > Druid > ClickHouse.
Both Kylin and ClickHouse require domain experts to design tables.
Kylin and ClickHouse can exhibit large fluctuations in query latency depending on data distribution and query patterns.
ClickHouse offers the most advanced vectorized execution; Druid provides limited vectorization; Kylin currently lacks vectorized computation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
