How ClickHouse Turns MySQL Bottlenecks into Sub‑Second OLAP Queries
This article introduces ClickHouse, compares column‑store and row‑store databases, shows how migrating a 50‑million‑row MySQL table to ClickHouse reduced query time from minutes to under one second, and shares practical installation, migration, performance testing, and synchronization tips.
What is ClickHouse?
ClickHouse is an open‑source column‑store database from Yandex designed for real‑time analytical queries. It processes data 100‑1000× faster than traditional row‑based systems, handling billions of rows per second per server.
Fundamental Concepts
Distinguish OLTP (transactional, row‑oriented) from OLAP (analytical, column‑oriented). Columnar storage reads only required columns, reducing I/O, while row‑based stores entire rows.
Business Problem
A MySQL table with 50 million rows required more than three minutes for a join query. After indexing, sharding, and logical optimization the improvement was limited, so ClickHouse was introduced, reducing query time to under one second—a 200‑fold speedup.
ClickHouse Practice
1. Installing ClickHouse on macOS
Installation can be done via Docker or by compiling from source.
2. Migrating Data from MySQL to ClickHouse
ClickHouse supports most MySQL syntax, offering five migration methods: engine mapping, INSERT‑SELECT, CREATE‑TABLE‑AS‑SELECT, CSV import, and StreamSets. The article uses the CREATE‑TABLE‑AS‑SELECT approach:
CREATE TABLE [IF NOT EXISTS] db.table_name ENGINE = MergeTree AS SELECT * FROM mysql('host:port','db','database','user','password')3. Performance Test Comparison
For a 50 million‑row dataset (≈10 GB in MySQL, 600 MB in ClickHouse):
MySQL query time: 205 seconds
ClickHouse query time: under 1 second
4. Data Synchronization Strategies
1) Temporary table: load full MySQL data into a ClickHouse temporary table, then replace the target table. Suitable for moderate data volumes with frequent incremental changes.
2) synch: an open‑source tool that reads MySQL binlog, converts statements to tasks, and streams them via a message queue.
5. Why ClickHouse Is Fast
Only the required columns are read, reducing I/O.
Same‑type column storage enables high compression (up to 10×).
Custom storage‑aware algorithms further accelerate queries.
Pitfalls Encountered
1. Data Type Differences Between ClickHouse and MySQL
Example: a query error can be resolved by casting IDs to unsigned integers, e.g., LEFT JOIN B b ON toUInt32(h.id) = toUInt32(ec.post_id).
2. Asynchronous Deletes/Updates
ClickHouse’s MergeTree engine guarantees eventual consistency; for strict consistency, perform full data synchronization.
Conclusion
ClickHouse eliminated the MySQL query bottleneck, delivering sub‑second responses for datasets up to two billion rows and scaling to clusters for larger volumes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
