Understanding ClickHouse: Architecture, MySQL Migration, Performance Optimization, and Common Pitfalls
This article introduces ClickHouse as a column‑oriented OLAP database, explains its architectural differences from row‑based systems, details a real‑world migration from MySQL with code examples, showcases performance gains, and shares practical tips and pitfalls for production use.
1. What is ClickHouse?
ClickHouse is a column‑oriented database management system (DBMS) designed for online analytical processing (OLAP).
We first clarify some basic concepts:
OLTP: Traditional relational databases that focus on insert, update, delete, and query operations with strong transaction consistency (e.g., banking, e‑commerce systems).
OLAP: Warehouse‑type databases that primarily read data for complex analysis, supporting decision‑making with simple, intuitive results.
Next, we illustrate the difference between column‑oriented and row‑oriented databases.
In traditional row‑oriented databases (MySQL, PostgreSQL, MS SQL Server), data is stored row by row:
In column‑oriented databases (ClickHouse), data is stored column by column:
The comparison of storage methods is shown below:
For more details, refer to the official ClickHouse manual.
2. Business Problem
The business side stored a 50‑million‑row main table and two auxiliary tables in MySQL. A single join query took more than 3 minutes, and even after indexing, sharding, and logical optimizations, performance remained poor, prompting a migration to ClickHouse.
After optimization, query time dropped to under 1 second, achieving a 200‑fold performance improvement.
The goal of this article is to help readers quickly master this powerful tool and avoid common pitfalls in practice.
3. ClickHouse Practice
1. Installing ClickHouse on macOS
Installation was performed via Docker (a tutorial link is provided). Building from source is also possible but more cumbersome.
https://blog.csdn.net/qq_24993831/article/details/103715194
2. Data Migration: MySQL → ClickHouse
ClickHouse supports most MySQL syntax, making migration low‑cost. Five migration approaches are available:
CREATE TABLE ENGINE MySQL – keep data in MySQL and map it.
INSERT INTO … SELECT FROM – create the table first, then import.
CREATE TABLE AS SELECT FROM – create and import in one step.
CSV offline import.
StreamSets.
We chose the third method (CREATE TABLE AS SELECT) and executed the following statement:
CREATE TABLE [IF NOT EXISTS] [db.]table_name ENGINE = Mergetree AS SELECT * FROM mysql('host:port','db','database','user','password')3. Performance Test Comparison
The chart below shows the query latency before and after migration:
4. Data Synchronization方案
Temporary Table : Create a temp intermediate table, fully sync MySQL data into ClickHouse's temp table, then replace the original table. Suitable for moderate data volumes with frequent incremental changes.
Synch :
An open‑source sync tool (synch) captures MySQL binlog statements and forwards them via a message queue for consumption.
5. Why is ClickHouse Fast?
Only the required columns are read, avoiding full‑row I/O.
Same‑type columns achieve up to ten‑fold compression, further reducing I/O.
ClickHouse applies customized search algorithms based on storage scenarios.
4. Pitfalls Encountered
1. Data Type Differences between ClickHouse and MySQL
Direct MySQL queries caused errors. The solution was to cast types, e.g., LEFT JOIN B b ON toUInt32(h.id) = toUInt32(ec.post_id), converting both sides to unsigned integers.
2. Delete/Update Operations are Asynchronous (Eventual Consistency)
Even the most consistent MergeTree engine only guarantees eventual consistency. For strict consistency requirements, a full‑copy sync is recommended.
5. Summary
By applying ClickHouse in practice, the MySQL query bottleneck was eliminated; queries on datasets under 2 billion rows return within 1 second for 90 % of cases. ClickHouse also scales to clusters for larger workloads.
References:
ClickHouse official manual
ClickHouse usage in Ctrip hotel services
How to choose ClickHouse engines
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
