Comparative Analysis of Apache Doris and ClickHouse for OLAP Workloads
This article presents a detailed technical comparison between Apache Doris and ClickHouse, covering their architecture, deployment, distributed capabilities, transaction support, data import, storage design, query performance, cost, and future development, and provides guidance on selecting the appropriate engine for specific OLAP scenarios.
Apache Doris, contributed by Baidu, is an open‑source MPP analytical database offering sub‑second query latency, simple distributed architecture, and strong operational ease, supporting data sets larger than 10 PB and a variety of analytical workloads such as historical reporting, real‑time analysis, and exploratory queries. ClickHouse, open‑sourced by Yandex, also follows an MPP design, boasts a vectorized execution engine that is claimed to be 100‑1000× faster than transactional databases, and provides rich functionality and high reliability.
Both engines are extensively used at JD.com, where clusters of over 3,000 servers handle transaction, traffic, offline, and dashboard scenarios. The article combines JD’s research findings and years of practical experience to compare Doris and ClickHouse, validating common assumptions and offering reference points for scenario‑based selection and kernel development.
Doris advantages include simpler table creation, better SQL standard compliance, superior join performance, stronger data import capabilities, flexible scaling, automatic node recovery, and richer community support. ClickHouse advantages focus on higher raw performance for data ingestion and single‑table queries, a broader set of table engines, extensive function support, and more flexible multi‑tenant resource management.
Selection guidance suggests using ClickHouse for complex, large‑scale scenarios where custom development resources are available, while Doris is recommended for one‑stop analytical solutions with limited development effort.
Architecture analysis compares deployment, distributed capabilities, data import, query processing, storage, and cost. Doris consists of a FrontEnd (FE) written in Java and a BackEnd (BE) in C/C++, communicating via BRPC, with FE handling metadata, SQL parsing, optimization, and planning, and BE managing storage and execution. ClickHouse includes a client, copier, and server, all implemented in C++11+, using Poco libraries.
Deployment and operations : Doris requires only FE and BE components, while ClickHouse needs a server, ZooKeeper, and a proxy. Both use Ansible or SaltStack for batch updates, support hot‑config reloads, and provide SQL commands for scaling (e.g., ALTER SYSTEM ADD/DECOMMISSION BACKENDS for Doris).
Distributed protocol and HA : Doris embeds a BerkeleyDB‑JE HA module with a Master/Follower/Observer model, providing automatic metadata synchronization and three‑replica data storage. ClickHouse relies on external ZooKeeper for DDL and replica coordination, which can become a performance bottleneck.
Transaction support : Doris offers ACID‑like guarantees for data import and materialized view updates, while ClickHouse lacks native transaction semantics, requiring external mechanisms for consistency.
Data import mechanisms differ: Doris provides RoutineLoad, BrokerLoad, and StreamLoad for Kafka, HDFS, and custom pipelines, whereas ClickHouse primarily uses HTTP interfaces and external tools such as clickhouse‑copier . Example import commands are shown below:
ALTER TABLE db.table DELETE WHERE filter_expr; ALTER TABLE db.table UPDATE column1 = expr1 WHERE filter_expr;Storage architecture : Both use columnar storage. Doris partitions data by table, partition, bucket/tablet, and segment, supporting range partitions and hash bucketing. ClickHouse organizes data into distributed tables, shards, and parts, with each column stored in separate files, enabling high cache efficiency but higher I/O overhead.
Table engines and models : Doris supports Duplicated Key, Aggregate Key, Unique Key, and Rollup (materialized view) models. ClickHouse offers the MergeTree family (ReplicatedMergeTree, ReplacingMergeTree, AggregatingMergeTree) and in‑memory dictionary tables. Both provide MVCC‑style concurrency control.
Data types : ClickHouse supports complex types such as Array, Nested, Map, and Tuple, whereas Doris focuses on structured data.
Query processing : Doris provides multiple join strategies (Local, Broadcast, Shuffle, Hash) and leverages vectorized execution. ClickHouse supports Local and Broadcast joins, requiring query rewrites for complex multi‑table joins. Both engines execute vectorized queries, with performance varying across workloads.
Concurrency and cost : Doris scales concurrency by increasing table‑level replica counts; ClickHouse scales at the cluster level. Doris generally has lower operational cost, stronger metadata consistency, and easier elasticity, while ClickHouse may suffer from ZooKeeper bottlenecks, limited elastic scaling, and higher failure impact.
Performance testing using TPC‑DS benchmarks shows ClickHouse excels in single‑table latency and concurrency, whereas Doris outperforms ClickHouse in complex multi‑table joins due to richer join optimization and less need for query rewriting.
Future outlook : JD plans to further adopt Doris for broader OLAP use cases and contribute to the Apache community, while continuing to enhance ClickHouse’s cloud‑native capabilities, high‑availability mechanisms, and online scaling features.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.