How ClickHouse Replicates MySQL in Real-Time: A Step‑by‑Step Guide
This article explains how ClickHouse can act as a MySQL replica, covering full and incremental synchronization, supported MySQL versions, DDL compatibility, handling of DELETE/UPDATE events, the underlying binlog mechanism, and provides complete code examples to set up both MySQL master and ClickHouse slave.
Overview
ClickHouse can be mounted as a MySQL replica, providing full‑load then incremental real‑time synchronization of MySQL data. The feature supports MySQL 5.6, 5.7 and 8.0, handles DELETE, UPDATE, and most common DDL operations. It is currently in Alpha stage and relies on community feedback for rapid iteration.
Code Acquisition
Since the feature is still under validation, the code is fetched from a GitHub pull request:
git fetch origin pull/10851/head:mysql_replica_experimentMySQL Master Setup
Start a MySQL container with binary logging enabled:
docker run -d -e MYSQL_ROOT_PASSWORD=123 mysql:5.7 mysqld --datadir=/var/lib/mysql --server-id=1 --log-bin=/var/lib/mysql/mysql-bin.log --gtid-mode=ON --enforce-gtid-consistencyCreate a database and a table, then insert sample data:
mysql> create database ckdb;</code>
<code>mysql> use ckdb;</code>
<code>mysql> create table t1(a int not null primary key, b int);</code>
<code>mysql> insert into t1 values(1,1),(2,2);</code>
<code>mysql> select * from t1;ClickHouse Slave Setup
Create a replication channel using the MaterializeMySQL engine:
CREATE DATABASE ckdb ENGINE = MaterializeMySQL('172.17.0.2:3306', 'ckdb', 'root', '123');</code>
<code>use ckdb;</code>
<code>show tables;</code>
<code>select * from t1;Check the synchronization metadata:
cat ckdatas/metadata/ckdb/.metadata</code>
<code>Version:1</code>
<code>Binlog File:mysql-bin.000001</code>
<code>Binlog Position:913</code>
<code>Data Version:0DELETE Propagation
Execute a delete on the MySQL master: mysql> delete from t1 where a=1; Query the ClickHouse replica and observe the row removal:
clickhouse :) select * from t1;</code>
<code>┌─a─┬─b─┐</code>
<code>│ 2 │ 2 │</code>
<code>└───┴───┘The metadata now shows Data Version:2.
UPDATE Propagation
Run an update on the MySQL master: mysql> update t1 set b=b+1; Query the ClickHouse replica and see the updated value:
clickhouse :) select * from t1;</code>
<code>┌─a─┬─b─┐</code>
<code>│ 2 │ 3 │</code>
<code>└───┴───┘Implementation Mechanism
ClickHouse consumes MySQL binlog events. The main event types are:
MYSQL_QUERY_EVENT – DDL
MYSQL_WRITE_ROWS_EVENT – INSERT
MYSQL_UPDATE_ROWS_EVENT – UPDATE
MYSQL_DELETE_ROWS_EVENT – DELETE
After a transaction commits, MySQL writes the corresponding events to the binlog. By implementing the MySQL Replication Protocol, ClickHouse can stream these events directly.
The three major challenges are DDL compatibility, DELETE/UPDATE support, and query filtering.
DDL Handling
When a MySQL table is replicated, ClickHouse creates an ATTACH TABLE with hidden columns _sign (‑1 for delete, 1 for insert) and _version. The engine used is ReplacingMergeTree with _version as the version column, and the original primary key becomes the sorting and partition key.
ATTACH TABLE t1 (
`a` Int32,
`b` Nullable(Int32),
`_sign` Int8,
`_version` UInt64
) ENGINE = ReplacingMergeTree(_version)
PARTITION BY intDiv(a, 4294967)
ORDER BY tuple(a)
SETTINGS index_granularity = 8192;DELETE and UPDATE
Each change creates a new part in ClickHouse:
Part 1 – initial INSERT produces rows with _sign = 1 and _version = 1.
Part 2 – DELETE generates a row with the same primary key, _sign = -1 and an incremented _version.
Part 3 – UPDATE creates a new row with the updated values, _sign = 1 and a further incremented _version.
Querying with the FINAL modifier lets ClickHouse deduplicate based on _version and _sign:
clickhouse :) select a,b,_sign,_version from t1 final;</code>
<code>┌─a─┬─b─┬─_sign─┬─_version─┐</code>
<code>│ 1 │ 1 │ -1 │ 2 │</code>
<code>│ 2 │ 3 │ 1 │ 3 │</code>
<code>└───┴───┴───────┴──────────┘Query Filtering
The MaterializeMySQL engine treats rows with _sign = -1 as deleted and automatically filters them out in normal queries.
Conclusion
Real‑time MySQL replication in ClickHouse (pull request #10851) bridges the gap between OLTP and OLAP by consuming binlog events directly, converting them to ClickHouse blocks, and writing them to the storage engine with minimal latency. It supports database‑level replication, multi‑source setups, and plans to add a CRC‑based consistency check in the future.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
