ClickHouse Overview: Architecture, MySQL Migration, Performance Testing, and Practical Tips
This article introduces ClickHouse, a high‑performance open‑source columnar database, explains its architecture versus row‑based systems, details migration from MySQL, showcases installation, performance benchmarks, data‑sync strategies, common pitfalls, and summarizes its benefits for large‑scale analytical workloads.
1. What is ClickHouse?
ClickHouse is an open‑source, column‑store database developed by Yandex for real‑time data analysis, offering query speeds 100‑1000 times faster than traditional row‑based databases.
It can process billions of rows and tens of terabytes of data per server per second.
Fundamental Concepts
OLTP: Traditional relational databases focused on transactions (e.g., banking, e‑commerce).
OLAP: Warehouse‑type databases designed for complex analytical queries and decision support.
Column‑store databases store data by column rather than by row, reducing I/O and improving compression.
2. Business Problem
A MySQL table with 50 million rows required over 3 minutes for a join query. After indexing, sharding, and logical optimizations, performance remained poor, prompting a migration to ClickHouse.
Post‑migration, query time dropped to under 1 second, a 200× improvement.
3. ClickHouse Practice
3.1 Installation on macOS
The author installed ClickHouse via Docker (alternatively, a compiled binary can be used).
3.2 Data Migration from MySQL to ClickHouse
ClickHouse supports most MySQL syntax, offering five migration approaches:
CREATE TABLE ENGINE=MySQL – keep data in MySQL.
INSERT INTO … SELECT FROM – create table then import.
CREATE TABLE AS SELECT FROM – create and import in one step.
CSV offline import.
Streamsets.
The third method was chosen, using the following statement:
CREATE TABLE [IF NOT EXISTS] [db.]table_name ENGINE = Mergetree AS SELECT * FROM mysql('host:port','db','database','user','password')3.3 Performance Test Comparison
3.4 Data Synchronization Solutions
Temporary Table
Full data is copied from MySQL to a ClickHouse temporary table, then swapped with the production table—suitable for moderate data volumes with frequent incremental changes.
Synch
Synch is an open‑source tool that captures MySQL binlog events and streams them via a message queue for real‑time replication.
https://github.com/long2ice/synch/blob/dev/README-zh.md
3.5 Why ClickHouse Is Fast
Only reads required columns, reducing I/O.
Columnar storage enables high compression ratios (up to 10×), further lowering I/O.
Custom storage‑engine algorithms optimize data retrieval for specific workloads.
4. Pitfalls Encountered
4.1 Data Type Differences
Direct MySQL queries caused errors; casting to matching unsigned types resolved the issue, e.g., LEFT JOIN B b ON toUInt32(h.id) = toUInt32(ec.post_id) .
4.2 Asynchronous Delete/Update
Even the MergeTree engine guarantees only eventual consistency; for strict consistency, a full data sync is recommended.
5. Conclusion
By adopting ClickHouse, the author eliminated MySQL query bottlenecks, achieving sub‑second responses for datasets under 2 billion rows, with scalability to clustered deployments.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.