Databases 26 min read

Doris vs ClickHouse: Which MPP Database Wins for Large‑Scale OLAP?

This article compares Apache Doris and ClickHouse across architecture, deployment, multi‑tenant management, data import, storage, query capabilities, performance testing, and cost, providing practical guidance for selecting the most suitable analytical database in large‑scale OLAP scenarios.

dbaplus Community

Nov 23, 2021

Doris vs ClickHouse: Which MPP Database Wins for Large‑Scale OLAP?

Background

Apache Doris (contributed by Baidu) and ClickHouse (open‑sourced by Yandex) are both MPP analytical databases designed for petabyte‑scale workloads. Doris emphasizes ease of use, strong SQL‑99 compatibility, and built‑in transactional import semantics. ClickHouse focuses on a high‑performance vectorized engine, a rich set of table engines, and extensive optimizer parameters.

Key Differences and Selection Guidance

Doris strengths

Simpler table definition and SQL syntax; better support for standard joins and aggregation functions.

Automatic node recovery, flexible online scaling via ALTER SYSTEM ADD/DECOMMISSION BACKENDS, and strong community support.

Integrated HA metadata (BerkeleyDB JE), transaction‑like import guarantees, materialized view auto‑aggregation, and query routing.

ClickHouse strengths

Higher raw throughput for data loading and single‑table queries.

Multiple table engines (MergeTree family, ReplicatedMergeTree, ReplacingMergeTree, AggregatingMergeTree, etc.) and extensive function library.

Fine‑grained quota and permission controls for multi‑tenant environments.

Guidance:

Choose ClickHouse when the workload involves very large data volumes, complex custom optimizations, and the team can invest in operational tooling.

Choose Doris for a turnkey analytical platform that requires minimal R&D effort and provides built‑in transactional import.

Architecture Comparison

1. Deployment & Operations

Doris consists of a FrontEnd (FE) handling SQL parsing, planning, and metadata, and one or more BackEnds (BE) for storage and execution. An optional BrokerLoad component assists batch imports. ClickHouse runs a single ClickHouse Server; optional ClickHouseProxy and ZooKeeper provide request forwarding, quota enforcement, and distributed DDL.

Both systems use Ansible or SaltStack for bulk updates, support hot‑config reloads, and expose SQL commands for node addition ( ADD BACKEND in Doris; ClickHouse requires manual config edits).

2. Multi‑Tenant Management

ClickHouse provides per‑user quota settings for memory, threads, and query time, enabling shared‑cluster multi‑tenant usage. Doris offers simpler tenant isolation but lacks fine‑grained quota enforcement.

3. Cluster Migration

Doris uses built‑in BACKUP / RESTORE commands to snapshot data and metadata to object storage or HDFS, supporting incremental backups by partition. Migration is performed by adding new nodes and decommissioning old ones while the system rebalances tablets.

ClickHouse offers the clickhouse‑copier tool for large‑scale data copying and the REMOTE keyword for small migrations. Third‑party tools such as https://github.com/AlexAkulov/clickhouse-backup can also be used.

4. Scaling

Doris supports online scaling with ALTER SYSTEM ADD/DECOMMISSION BACKENDS; data rebalancing occurs at the tablet level (each tablet ~hundreds of MB). ClickHouse requires manual configuration changes and custom scripts; it does not provide built‑in online scaling.

5. Distributed Capability & HA

Doris implements HA metadata with a three‑node FE cluster (followers + optional observers) using BerkeleyDB JE. Writes use quorum protocols to guarantee consistency.

ClickHouse relies on ZooKeeper for DDL coordination and replica metadata. ZooKeeper can become a bottleneck at large scale; a Raft‑based replacement is under development.

6. Transaction Support

Doris provides ACID‑like import transactions: loads are idempotent, and materialized view updates are atomic. ClickHouse lacks native transaction semantics; imports are atomic only up to ~1 million rows and require external validation.

7. Data Import

Doris offers three built‑in ingestion methods: RoutineLoad – continuous Kafka consumption. BrokerLoad – batch import from HDFS or object storage. StreamLoad – HTTP‑based real‑time load with idempotency.

ClickHouse imports data via external engines (Kafka, HDFS) and tools like clickhouse‑copier. There is no dedicated background import service.

-- Doris import example
INSERT INTO tbl VALUES (...);

-- ClickHouse import example
INSERT INTO tbl VALUES (...);

8. Storage Architecture

Both use columnar storage.

Doris follows Google Mesa’s MVCC model with a hierarchy of Table → Partition → Bucket/Tablet → Segment. Segments contain three index types: physical, sparse, and ZoneMap.

ClickHouse stores data in Parts that are merged into larger parts (Merge) and can be mutated (UPDATE/DELETE) via ALTER TABLE … DELETE or ALTER TABLE … UPDATE.

ALTER TABLE db.tbl DELETE WHERE col = 1;
ALTER TABLE db.tbl UPDATE col = expr WHERE col2 > 10;

9. Table Engines & Models

Doris supports four key models:

Duplicated Key – raw detail table.

Aggregate Key – pre‑aggregated table.

Unique Key – primary‑key table with deduplication.

Rollup – materialized view built on top of the above.

ClickHouse’s primary engine family is MergeTree:

ReplicatedMergeTree – replicated tables.

ReplacingMergeTree – supports row replacement.

AggregatingMergeTree – pre‑aggregated tables.

Memory tables – for fast temporary data.

10. Query Processing

Both engines execute vectorized pipelines. Doris uses MySQL‑compatible syntax with full SQL‑99 support and window functions. ClickHouse implements a subset of SQL‑2011; complex joins often require rewriting (e.g., using GLOBAL INNER JOIN).

Performance Evaluation (TPC‑DS)

Testbed: 3 nodes, each 32 CPU cores, 128 GB RAM, HDD storage. Versions: Doris 0.13.1, ClickHouse 21.3.13.1. Cluster configuration: 3 shards × 1 replica (default settings). A subset of 18 queries (9 multi‑table joins, 9 single‑table aggregates) was executed.

Results

Single‑table queries : ClickHouse consistently achieved lower latency and higher concurrency, benefiting from its vectorized engine and direct local‑table writes.

Multi‑table joins : Doris outperformed ClickHouse, especially for large‑table joins, due to its multiple join strategies (broadcast, shuffle, hash) and automatic materialized view aggregation.

ClickHouse’s join latency grows sharply when the small side exceeds ~10 million rows, whereas Doris scales more gracefully.

Future Outlook

Doris will continue expanding its OLAP ecosystem within JD, contributing to the Apache project to become a globally leading analytical database. ClickHouse’s roadmap focuses on cloud‑native OLAP, improving high‑availability (Raft‑based metadata), and adding online scaling tools to meet large‑scale production demands.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems ClickHouse OLAP performance comparison Apache Doris analytical database

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.