Which Open‑Source OLAP Engine Wins the TPC‑DS Benchmark? A Deep Performance Comparison
Using the TPC‑DS benchmark’s 99 queries on a 1 TB dataset, this study evaluates the performance of four open‑source OLAP engines—ClickHouse, Doris, Presto, and ByConity—across basic, join, aggregation, subquery, and window‑function scenarios, revealing ByConity’s superior speed and the limitations of ClickHouse.
As data volume and complexity grow, many enterprises adopt OLAP (Online Analytical Processing) engines for large‑scale data analysis, where performance is a critical factor.
This article uses the 99 queries of the TPC‑DS (Transaction Processing Performance Council Decision Support) benchmark to compare the performance of four open‑source OLAP engines—ClickHouse, Doris, Presto, and ByConity—providing guidance for selecting a suitable engine.
Benchmark Overview
TPC‑DS is a decision‑support benchmark that simulates multidimensional analytical workloads with 99 SQL queries covering joins, aggregations, subqueries, and window functions.
Engine Introductions
ClickHouse : A column‑store DBMS from Yandex, optimized for fast queries on massive data.
Doris : A distributed columnar storage and analysis system that integrates with Hadoop, Spark, and Flink.
Presto : A distributed SQL query engine originally developed by Facebook for interactive analytics.
ByConity : ByteDance’s cloud‑native data warehouse with compute‑storage separation, strong consistency, and advanced OLAP optimizations.
Test Environment Configuration
Hardware : 5 workers per engine, each node equipped with 48 CPU cores (Intel Xeon E5‑2650 v4 @ 2.20 GHz), 256 GB RAM, ATA 7200 rpm disks, Linux 4.14 kernel, Debian 9.
Data : 1 TB tables (~2.8 billion rows) generated by TPC‑DS.
Software Versions : ClickHouse 23.4.1.1943, Doris 1.2.4.1, Presto 0.28.0, ByConity 0.1.0‑GA.
Other Settings : ClickHouse global distributed_product_mode, Doris bucket configuration, Presto Hive catalog with ORC format, ByConity optimizer enabled (CBO, RBO).
<code>Architecture: x86_64</code><code>CPU(s): 48</code><code>Model name: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz</code><code>NUMA nodes: 2</code>Test Methodology
Run all 99 TPC‑DS queries on each engine using the same 1 TB dataset and identical hardware.
Execute each query multiple times and take the average; query timeout set to 500 seconds.
Record execution plans, I/O, and CPU usage for each query.
Overall Results
ByConity completed all 99 queries, outperforming the other three engines by 3–4× in total execution time. ClickHouse only completed about 50 % of queries due to join limitations and frequent timeouts.
Basic Query Scenario
In single‑table queries, ByConity leads, with Presto and Doris also performing well. ClickHouse suffers many timeouts (e.g., SQL5, 8, 11, 13, 14, 17, 18).
Join Query Scenario
ByConity’s cost‑based optimizer (CBO) provides the best performance for multi‑table joins. Presto and Doris follow, while ClickHouse struggles with join reordering.
Aggregation Query Scenario
ByConity remains the fastest, followed by Doris and Presto. ClickHouse experiences several timeouts, which were capped at 250 seconds for clarity.
Subquery Scenario
ByConity’s rule‑based optimizer (RBO) transforms subqueries into join‑agg patterns, achieving the best results. Doris and Presto also perform reasonably, though each encounters a few timeouts.
Window Function Scenario
ByConity again leads, with Presto second. Doris has a single timeout, while ClickHouse still fails to complete many queries.
Conclusion
ByConity consistently outperforms ClickHouse, Doris, and Presto across all TPC‑DS query categories, delivering 3–4× faster execution. ClickHouse’s architecture, not optimized for complex joins, limits its performance in multi‑table scenarios. When choosing an OLAP engine, factors such as scalability, ease of use, and stability should also be considered alongside benchmark results.
ByteDance Data Platform
The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.