Introducing ClickHouse: Columnar Database Overview, MySQL Migration, Performance Comparison, and Practical Tips
This article introduces ClickHouse, explains its column‑oriented architecture versus row‑oriented databases, details a MySQL‑to‑ClickHouse migration, shows a performance benchmark that reduces query time from minutes to seconds, and shares practical deployment tips and common pitfalls.
1. What is ClickHouse?
ClickHouse is an open‑source, column‑store database from Yandex designed for real‑time analytics, offering processing speeds 100‑1000 times faster than traditional methods. Its performance surpasses comparable columnar DBMSs, handling billions of rows and tens of gigabytes per server per second.
We first clarify some basic concepts
OLTP: Traditional relational databases focused on insert, update, delete, and query operations with strong transaction consistency (e.g., banking, e‑commerce systems).
OLAP: Warehouse‑type databases aimed at reading data for complex analysis, supporting decision‑making with simple, intuitive results.
Next we use diagrams to understand the difference between columnar and row‑oriented databases.
In traditional row‑oriented DB systems (MySQL, PostgreSQL, MS SQL Server), data is stored row by row:
In columnar DB systems (ClickHouse), data is stored column by column:
Comparison of storage methods:
The above is a basic introduction to ClickHouse; see the official documentation for more details.
https://clickhouse.tech/docs/zh/
2. Business Problem
The business data resides in MySQL with a 50 million‑row main table and two auxiliary tables; a single join query takes over 3 minutes. After indexing, sharding, and logical optimization, performance remained poor, prompting a migration to ClickHouse.
Result: query time reduced to under 1 second, achieving a 200× speedup.
We hope this article helps readers quickly master this powerful tool and avoid common pitfalls.
3. ClickHouse Practice
1. Installing ClickHouse on macOS
Installed via Docker; alternatively, you can compile from source, which is more cumbersome.
2. Data Migration: MySQL to ClickHouse
ClickHouse supports most MySQL syntax, making migration low‑cost. Five migration approaches are available:
create table engine mysql – mapping; data stays in MySQL.
insert into select from – create table first, then import.
create table as select from – create and import simultaneously.
CSV offline import.
streamsets.
We chose the third method:
CREATE TABLE [IF NOT EXISTS] [db.]table_name ENGINE = Mergetree AS SELECT * FROM mysql('host:port','db','database','user','password')3. Performance Test Comparison
Type
Data Volume
Table Size
Query Speed
MySQL
50 million
10 GB
205 s
ClickHouse
50 million
600 MB
under 1 s
4. Data Synchronization方案
Temporary Table
Create a temp intermediate table, fully sync MySQL data into the ClickHouse temp table, then replace the original ClickHouse table; suitable for moderate data volumes with frequent incremental updates.
synch
Open‑source synchronization tool recommended: synch.
https://github.com/long2ice/synch/blob/dev/README-zh.md
The principle is to read MySQL binlog, extract SQL statements, and consume tasks via a message queue.
5. Why is ClickHouse Fast?
Only reads the columns needed for computation, reducing I/O.
Same‑type columns enable up to ten‑fold compression, further lowering I/O.
ClickHouse applies specialized search algorithms tailored to storage scenarios.
4. Pitfalls Encountered
1. Data Type Differences Between ClickHouse and MySQL
Running MySQL queries caused errors:
Solution: use LEFT JOIN B b ON toUInt32(h.id) = toUInt32(ec.post_id) to unify unsigned types.
2. Delete or Update Operations Are Asynchronous, Providing Only Eventual Consistency
The ClickHouse manual shows that even the Mergetree engine guarantees only eventual consistency.
If strong consistency is required, a full data sync is recommended.
5. Summary
Through the ClickHouse practice, we completely resolved the MySQL query bottleneck; queries on datasets up to 2 billion rows can return results within 1 second, and ClickHouse also supports clustering for larger scales.
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.