Databases 7 min read

ClickHouse Performance Testing and Optimization: Case Studies and Practical Lessons

This article presents a detailed performance comparison of ClickHouse using two real‑world cases, analyzes query timings, explores parameter tuning, schema and SQL adjustments, and summarizes practical conclusions for achieving sub‑second analytics on billion‑row datasets.

Big Data Technology Architecture

Jul 24, 2019

ClickHouse Performance Testing and Optimization: Case Studies and Practical Lessons

Background

The author began experimenting with ClickHouse last year but found its distributed mode under‑performing for multi‑table joins, with many SQL limitations in version 18.12.13. Inspired by a Ctrip article highlighting ClickHouse's impressive speed, the author revisited the database for further testing.

Test Cases

1. Ctrip case

Version: 18.12.13

Parameter

Configuration

CPU

40c

Memory

128 GB

Disk

SSD

Swap

Disabled

Data size:

Table

Rows

10 M

20 M

60 M

240 M

Test scenarios and latency:

Case

Time

A + B + C three‑table join aggregation

190 ms

B + D join aggregation

390 ms

A + B + D three‑table join aggregation

640 ms

According to Ctrip’s statistics, 90 % of their analytical queries finish within 500 ms.

2. YiQiXiu case

Version: 18.12.13

Parameter

Configuration

CPU

32c

Memory

128 GB

Disk

SSD

Swap

Disabled

Data size:

Table

Rows

40 M

130 M

Test scenarios:

Case

Time

B single‑table aggregation with sorting

2 s

B + D join aggregation with sorting

11 s

Figures show single‑table aggregation and multi‑table join performance.

Optimization Attempts

1. Parameter tuning – increasing max_memory_usage and max_bytes_before_external_group_by:

SET max_memory_usage = 128000000000;  # 128 GB
SET max_memory_usage = 60000000000;   # 60 GB

In the author’s 16 CPU / 68 GB / HDD environment this yielded a 2× speedup, but not in the current SSD setup.

2. Table definition – reducing the number of partitions gave a slight improvement.

3. SQL rewrite – always place the smaller table on the right side of a JOIN because ClickHouse scans the right table for each left‑table row.

4. Engine change – switching from the default MergeTree to a Memory engine did not affect performance.

5. I/O analysis – monitoring showed almost no disk I/O during query execution, only a brief 1‑2 s write phase at the end, confirming MergeTree’s efficiency.

Findings and Conclusions

SSD storage roughly doubles performance compared with HDD.

Aggregating billions of rows on a single table can finish in about 2 s; on HDD it takes ~4 s.

Adding selective filters before aggregation can reduce multi‑table join latency to sub‑second levels.

When joining, keep the larger table on the left and the smaller on the right.

If filtering is not possible, consider pre‑building wide tables or pre‑computing results.

For the author’s workload, halving the hardware specifications does not noticeably degrade performance.

Overall, ClickHouse delivers excellent performance for high‑concurrency single‑table analytics; multi‑table analytics benefit from targeted optimizations based on actual query patterns.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL performance testing ClickHouse database optimization SSD

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.