Big Data 16 min read

Choosing the Right OLAP Engine: Druid vs ClickHouse and Optimization Tips

This article introduces OLAP concepts, compares major OLAP solutions such as Druid, Kylin, Doris, and ClickHouse, outlines their features and suitable scenarios, and shares practical optimization techniques—including materialized views, caching, node tiering, and query tuning—to improve performance for high‑concurrency analytical workloads.

dbaplus Community
dbaplus Community
dbaplus Community
Choosing the Right OLAP Engine: Druid vs ClickHouse and Optimization Tips

1. OLAP Overview and Selection

OLAP (On‑Line Analytical Processing) supports enterprise decision‑making by enabling multi‑dimensional analysis of historical data, contrasting with OLTP which handles real‑time transactional operations.

Typical OLAP operations include drill‑down, roll‑up, slice, dice, and pivot.

1.1 Basic Operations

Drill‑down: Move from coarse to fine granularity (e.g., quarter sales → monthly sales).

Roll‑up: Aggregate from fine to coarse granularity (e.g., monthly sales → quarterly/yearly).

Slice: Select specific dimension values (e.g., only electronics sales).

Dice: Select a sub‑cube across multiple dimensions (e.g., Q1–Q2 sales).

Pivot: Swap rows and columns to view data from a different perspective.

1.2 OLAP Classification

Based on storage format, OLAP is divided into:

ROLAP (Relational OLAP)

MOLAP (Multi‑dimensional OLAP)

HOLAP (Hybrid OLAP)

MOLAP stores pre‑computed cubes; fast query performance but high storage cost and delayed data updates.

ROLAP relies on relational databases; real‑time data freshness but slower query speed.

HOLAP combines ROLAP (detail data) and MOLAP (aggregated data) for flexibility.

1.3 Main OLAP Engines and Use Cases

Druid : Pre‑computes aggregates for high‑throughput time‑series queries. Ideal for user‑behavior and network‑traffic analysis with low latency and strong concurrency.

Kylin : MOLAP cube engine built on Hive; suitable for massive datasets (>10^9 rows) with many dimensions, where query patterns are stable. Provides sub‑second responses via HBase‑backed cubes.

Doris : MPP query engine with simple FE/BE architecture, supporting standard SQL, high‑concurrency joins, and online schema changes. Fits real‑time analytics on PB‑scale data.

ClickHouse : Column‑store with vectorized execution, optimized for low‑QPS analytical workloads (e.g., internal BI reports). Not recommended for high‑concurrency joins or massive ad‑hoc reporting.

2. Application Scenarios and Overall Solution

Typical scenarios include:

Real‑time dashboards for order volume, inventory, and after‑sale metrics.

Live monitoring during large promotional events, requiring thousands of queries per day with sub‑100 ms latency.

High‑frequency data ingestion (billions of rows per day) while maintaining millisecond‑to‑second query response.

3. OLAP Optimization Practices

3.1 Druid Optimization

Materialized Views : Create pre‑aggregated indexes for frequently co‑occurring dimension groups to avoid scanning raw data.

Cache Queries : Use Redis to cache results by time granularity (day/hour). On a partial cache hit, only the missing time slices are queried and then merged.

Hot/Cold Data Tiering : Distribute hot data across dedicated broker nodes and configure node priority to prioritize hot‑tier queries.

# Cluster tiering (default is _default_tier)

druid.server.tier=hot
# Query priority (hot tier gets higher priority)

druid.server.priority=100
# Increase processing buffer size

processing.buff=4G

druid.processing.buffer.sizeBytes=1073741824

druid.processing.numThreads=30

3.2 ClickHouse Optimization

Distributed Aggregation : Enable distributed query mode so each node sends only its intermediate aggregation state to the coordinator, reducing network traffic.

Skip Indexes :

minmax : Stores min and max per data block; lightweight and ideal for loosely sorted columns.

set : Stores a set of distinct values per block; useful when block cardinality is low but overall cardinality is high.

Bloom filter : Probabilistic filter for high‑cardinality columns (e.g., order_id, product_id) to quickly skip irrelevant blocks.

Avoid Using FINAL : ReplacingMergeTree can de‑duplicate data during merges; invoking FINAL forces a full merge at query time, dramatically slowing performance. Instead, use explicit aggregation queries.

CREATE TABLE t_replacing_table(
    id UInt8,
    name String,
    age UInt8
) ENGINE = ReplacingMergeTree(age)
ORDER BY id;

INSERT INTO t_replacing_table VALUES (1,'张三',18),(2,'李四',19),(3,'王五',20);
INSERT INTO t_replacing_table VALUES (1,'张三',20),(2,'李四',15);

-- Query without FINAL, using argMax to get latest name per id
SELECT
    id,
    argMax(name, age) AS namex,
    max(age) AS agex
FROM t_replacing_table
GROUP BY id;

4. Conclusion

The presented OLAP architecture combines Druid for high‑throughput time‑series analysis and ClickHouse for fast column‑store queries, addressing the company's multi‑dimensional analytics needs. However, current OLAP solutions still struggle with extremely high‑concurrency self‑service analysis and complex multi‑table joins. Future work aims to build a unified platform that supports both detailed and aggregated queries, as well as join‑heavy workloads, to further enhance real‑time analytical capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataClickHouseData WarehouseOLAPDruid
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.