How to Sync MySQL to ClickHouse in Real‑Time Using MaterializeMySQL
This article demonstrates how to configure ClickHouse as a MySQL replica, showing step‑by‑step setup of a MySQL master, ClickHouse slave, real‑time binlog consumption, and handling of DDL, UPDATE, and DELETE events for seamless OLTP‑OLAP integration.
Overview
ClickHouse can be mounted as a MySQL replica, providing full‑load then incremental real‑time synchronization of MySQL data. The feature supports MySQL 5.6/5.7/8.0, compatible with DELETE/UPDATE statements and most common DDL operations. It is currently in Alpha and relies on community feedback for rapid iteration.
Code Acquisition
Since the feature is still under validation, the source code is pulled from a GitHub pull request:
git fetch origin pull/10851/head:mysql_replica_experimentAfter fetching, the code is compiled locally.
MySQL Master Setup
Run a MySQL container with binary logging enabled:
docker run -d -e MYSQL_ROOT_PASSWORD=123 mysql:5.7 \
mysqld --datadir=/var/lib/mysql \
--server-id=1 \
--log-bin=/var/lib/mysql/mysql-bin.log \
--gtid-mode=ON \
--enforce-gtid-consistencyCreate a database and a table, then insert sample data:
mysql> create database ckdb;</code><code>mysql> use ckdb;</code><code>mysql> create table t1(a int not null primary key, b int);</code><code>mysql> insert into t1 values(1,1),(2,2);</code><code>mysql> select * from t1;ClickHouse Slave Setup
Create a replication channel that materializes the MySQL database:
CREATE DATABASE ckdb ENGINE = MaterializeMySQL('172.17.0.2:3306', 'ckdb', 'root', '123');</code><code>use ckdb;</code><code>show tables;</code><code>select * from t1;Check the replication metadata:
cat ckdatas/metadata/ckdb/.metadata Version:1</code><code>Binlog File:mysql-bin.000001</code><code>Binlog Position:913</code><code>Data Version:0Delete Operation
Execute a DELETE on the MySQL master and observe the change on the ClickHouse replica: mysql> delete from t1 where a=1; ClickHouse now shows only the remaining row, and the metadata version increments to 2.
cat ckdatas/metadata/ckdb/.metadata Version:1</code><code>Binlog File:mysql-bin.000001</code><code>Binlog Position:1171</code><code>Data Version:2Update Operation
Run an UPDATE on the MySQL master and verify the updated values on ClickHouse:
mysql> update t1 set b=b+1; clickhouse :) select * from t1;The updated row appears with the new value.
Implementation Mechanism
Understanding MySQL binlog events is essential. The main event types are:
MYSQL_QUERY_EVENT – DDL
MYSQL_WRITE_ROWS_EVENT – INSERT
MYSQL_UPDATE_ROWS_EVENT – UPDATE
MYSQL_DELETE_ROWS_EVENT – DELETE
ClickHouse consumes these events via the MySQL Replication Protocol, converting them into internal blocks and writing directly to the storage engine. Three major challenges are DDL compatibility, DELETE/UPDATE support, and query filtering.
DDL Handling
When a MySQL table is replicated, ClickHouse creates a corresponding table with hidden columns _sign (‑1 for delete, 1 for insert) and _version. The engine used is ReplacingMergeTree(_version), with the primary key column serving as the sorting and partition key.
ATTACH TABLE t1 (</code><code> a Int32,</code><code> b Nullable(Int32),</code><code> _sign Int8,</code><code> _version UInt64</code><code>) ENGINE = ReplacingMergeTree(_version)</code><code>PARTITION BY intDiv(a, 4294967)</code><code>ORDER BY tuple(a)Update and Delete Representation
Each operation generates a separate part in ClickHouse:
Part 1 – initial INSERT produces rows with _sign=1 and _version=1.
Part 2 – DELETE creates a row with the same primary key, _sign=-1, and an incremented version.
Part 3 – UPDATE inserts a new row with the updated values, _sign=1, and a further version number.
Using the FINAL modifier removes obsolete rows based on _version and _sign, yielding the current state.
Query Filtering
During reads, the MaterializeMySQL engine filters out rows where _sign = -1, effectively hiding deleted records.
Conclusion
The real‑time MySQL‑to‑ClickHouse replication (pull request #10851) implements an internal binlog consumer that writes directly to ClickHouse’s storage engine, offering high efficiency compared to external binlog tools. It supports database‑level replication, multi‑source sync, and plans to add a CRC function for data‑consistency verification.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
