How ClickHouse Turned a 50 Million‑Row MySQL Query from Minutes into Seconds
This article explains what ClickHouse is, why a large MySQL table caused multi‑minute queries, and how migrating the data to ClickHouse using Docker, CREATE TABLE AS SELECT, and a lightweight sync tool reduced query time to under one second, while highlighting pitfalls and performance reasons.
What Is ClickHouse?
ClickHouse is a column‑oriented DBMS for online analytical processing (OLAP). It stores each column separately, enabling fast column‑wise reads and high compression compared with row‑based databases.
OLTP : transaction‑oriented workloads that modify rows.
OLAP : analytical workloads that read large data volumes for reporting.
Business Problem
A production MySQL database contained 50 million rows plus two auxiliary tables. A join query took more than three minutes. After indexing, sharding and logical optimizations the performance remained poor, so the data was migrated to ClickHouse.
Post‑migration the same query completed in under one second, a roughly 200‑fold speedup, and the table size shrank from ~10 GB to ~600 MB.
ClickHouse Practice
1. Installing ClickHouse on macOS
The quick way is to run ClickHouse in a Docker container. Alternatively, the source can be compiled and installed manually.
2. Data Migration from MySQL to ClickHouse
ClickHouse supports most MySQL syntax, making migration straightforward. Five migration approaches exist; the article used the “CREATE TABLE … AS SELECT FROM mysql(...)” method:
CREATE TABLE [IF NOT EXISTS] db.table_name ENGINE = MergeTree AS SELECT * FROM mysql('host:port','db','database','user','password');3. Performance Comparison
MySQL – 50 M rows, 10 GB table, 205 s query time.
ClickHouse – 50 M rows, 600 MB table, query completes in ≤1 s.
4. Data Synchronization Strategies
Temporary Table : Load the full MySQL dataset into a temporary ClickHouse table, then replace the production table. Suitable for moderate data volumes with frequent incremental changes.
Synch : Use the open‑source tool synch (https://github.com/long2ice/synch/blob/dev/README-zh.md) which reads MySQL binlog events, converts them to SQL, and pushes them via a message queue for near‑real‑time replication.
5. Why ClickHouse Is Fast
Only the required columns are read, avoiding full‑row I/O.
Uniform column types enable up to ten‑fold compression, further reducing I/O.
Specialized indexing and search algorithms are tailored to columnar storage.
Pitfalls Encountered
1. Data‑type Differences
Direct MySQL queries caused type‑mismatch errors. The fix is to cast identifiers to a common unsigned type, e.g., LEFT JOIN B b ON toUInt32(h.id) = toUInt32(ec.post_id).
2. Asynchronous Deletions/Updates
ClickHouse’s MergeTree engine guarantees only eventual consistency. For workloads that require strong consistency, a full‑refresh synchronization is recommended.
Conclusion
Migrating a 50 million‑row MySQL workload to ClickHouse reduced query latency from minutes to seconds and cut storage requirements dramatically. ClickHouse also scales to clusters for larger datasets, making it a compelling solution for analytical queries.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
