Databases 8 min read

Why ClickHouse Outperforms MySQL: Deep Dive into Architecture and Benchmarks

This article compares ClickHouse and MySQL by examining benchmark results, MPP architecture, columnar storage, compression techniques, vectorized execution, and index designs, showing why ClickHouse delivers dramatically higher query performance on massive data sets.

Senior Tony

Sep 19, 2024

Why ClickHouse Outperforms MySQL: Deep Dive into Architecture and Benchmarks

Overview

Although MySQL is widely used, it often struggles with large‑scale analytical workloads, prompting many engineers to switch to ClickHouse for faster data processing.

Benchmark Comparison

Official benchmarks on identical single‑node servers show ClickHouse far ahead of competitors. For a 1 billion‑row dataset, ClickHouse’s average response time is 2.63× faster than Vertica, 17× faster than InfiniDB, 27× faster than MonetDB, 126× faster than Hive, 429× faster than MySQL, and 10× faster than Greenplum.

MPP Architecture

ClickHouse uses a Massively Parallel Processing (MPP) architecture that distributes tasks across independent nodes, each performing its own calculations before aggregating results, providing high throughput and low latency for massive data.

Columnar Storage

In a typical scenario—calculating the average age from a table with 20 million rows—MySQL InnoDB reads entire pages (16 KB) row‑by‑row, scanning all columns. ClickHouse stores each column in separate .bin files, allowing it to read only the age.bin file, reducing I/O to roughly 1/20 of MySQL’s volume and dramatically improving performance.

Data Compression

ClickHouse’s default LZ4 compression achieves about 8:1 ratio, benefiting from the high redundancy of columnar data. MySQL InnoDB can compress tables with

ALTER TABLE sbtest1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8

but typically saves only 30‑50% space and adds CPU load without performance gains.

Vectorized Execution Engine

ClickHouse employs a SIMD‑based vectorized engine that executes the same instruction on a batch of data in registers, fully exploiting modern CPU parallelism to boost query speed.

Index Design

ClickHouse uses a sparse primary index that stores one marker per data block (default granularity 8192 rows), allowing millions of rows to be indexed with only a few thousand markers kept in memory. Secondary indexes include minmax, set, bloom_filter, ngram, tokenbf, and inverted indexes, each suited to different query patterns.

Conclusion

Through MPP processing, columnar storage, aggressive compression, vectorized execution, and advanced indexing, ClickHouse consistently outperforms MySQL on analytical workloads, making it a compelling choice for large‑scale data analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

index design ClickHouse MySQL Databases Columnar Storage MPP Vectorized Execution

Written by

Senior Tony

Former senior tech manager at Meituan, ex‑tech director at New Oriental, with experience at JD.com and Qunar; specializes in Java interview coaching and regularly shares hardcore technical content. Runs a video channel of the same name.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.