How ClickHouse Replicates MySQL in Real-Time: A Step‑by‑Step Guide
This article explains how ClickHouse can act as a MySQL replica, covering full and incremental synchronization, supported MySQL versions, DDL compatibility, handling of DELETE/UPDATE events, the underlying binlog mechanism, and provides complete code examples to set up both MySQL master and ClickHouse slave.
Overview
ClickHouse can be mounted as a MySQL replica, providing full‑load then incremental real‑time synchronization of MySQL data. The feature supports MySQL 5.6, 5.7 and 8.0, handles
DELETE,
UPDATE, and most common DDL operations. It is currently in Alpha stage and relies on community feedback for rapid iteration.
Code Acquisition
Since the feature is still under validation, the code is fetched from a GitHub pull request:
<code>git fetch origin pull/10851/head:mysql_replica_experiment</code>MySQL Master Setup
Start a MySQL container with binary logging enabled:
<code>docker run -d -e MYSQL_ROOT_PASSWORD=123 mysql:5.7 mysqld --datadir=/var/lib/mysql --server-id=1 --log-bin=/var/lib/mysql/mysql-bin.log --gtid-mode=ON --enforce-gtid-consistency</code>Create a database and a table, then insert sample data:
<code>mysql> create database ckdb;</code>
<code>mysql> use ckdb;</code>
<code>mysql> create table t1(a int not null primary key, b int);</code>
<code>mysql> insert into t1 values(1,1),(2,2);</code>
<code>mysql> select * from t1;</code>ClickHouse Slave Setup
Create a replication channel using the
MaterializeMySQLengine:
<code>CREATE DATABASE ckdb ENGINE = MaterializeMySQL('172.17.0.2:3306', 'ckdb', 'root', '123');</code>
<code>use ckdb;</code>
<code>show tables;</code>
<code>select * from t1;</code>Check the synchronization metadata:
<code>cat ckdatas/metadata/ckdb/.metadata</code>
<code>Version:1</code>
<code>Binlog File:mysql-bin.000001</code>
<code>Binlog Position:913</code>
<code>Data Version:0</code>DELETE Propagation
Execute a delete on the MySQL master:
<code>mysql> delete from t1 where a=1;</code>Query the ClickHouse replica and observe the row removal:
<code>clickhouse :) select * from t1;</code>
<code>┌─a─┬─b─┐</code>
<code>│ 2 │ 2 │</code>
<code>└───┴───┘</code>The metadata now shows
Data Version:2.
UPDATE Propagation
Run an update on the MySQL master:
<code>mysql> update t1 set b=b+1;</code>Query the ClickHouse replica and see the updated value:
<code>clickhouse :) select * from t1;</code>
<code>┌─a─┬─b─┐</code>
<code>│ 2 │ 3 │</code>
<code>└───┴───┘</code>Implementation Mechanism
ClickHouse consumes MySQL binlog events. The main event types are:
MYSQL_QUERY_EVENT – DDL
MYSQL_WRITE_ROWS_EVENT – INSERT
MYSQL_UPDATE_ROWS_EVENT – UPDATE
MYSQL_DELETE_ROWS_EVENT – DELETE
After a transaction commits, MySQL writes the corresponding events to the binlog. By implementing the MySQL Replication Protocol, ClickHouse can stream these events directly.
The three major challenges are DDL compatibility, DELETE/UPDATE support, and query filtering.
DDL Handling
When a MySQL table is replicated, ClickHouse creates an
ATTACH TABLEwith hidden columns
_sign(‑1 for delete, 1 for insert) and
_version. The engine used is
ReplacingMergeTreewith
_versionas the version column, and the original primary key becomes the sorting and partition key.
<code>ATTACH TABLE t1 (
`a` Int32,
`b` Nullable(Int32),
`_sign` Int8,
`_version` UInt64
) ENGINE = ReplacingMergeTree(_version)
PARTITION BY intDiv(a, 4294967)
ORDER BY tuple(a)
SETTINGS index_granularity = 8192;</code>DELETE and UPDATE
Each change creates a new part in ClickHouse:
Part 1 – initial INSERT produces rows with
_sign = 1and
_version = 1.
Part 2 – DELETE generates a row with the same primary key,
_sign = -1and an incremented
_version.
Part 3 – UPDATE creates a new row with the updated values,
_sign = 1and a further incremented
_version.
Querying with the
FINALmodifier lets ClickHouse deduplicate based on
_versionand
_sign:
<code>clickhouse :) select a,b,_sign,_version from t1 final;</code>
<code>┌─a─┬─b─┬─_sign─┬─_version─┐</code>
<code>│ 1 │ 1 │ -1 │ 2 │</code>
<code>│ 2 │ 3 │ 1 │ 3 │</code>
<code>└───┴───┴───────┴──────────┘</code>Query Filtering
The
MaterializeMySQLengine treats rows with
_sign = -1as deleted and automatically filters them out in normal queries.
Conclusion
Real‑time MySQL replication in ClickHouse (pull request #10851) bridges the gap between OLTP and OLAP by consuming binlog events directly, converting them to ClickHouse blocks, and writing them to the storage engine with minimal latency. It supports database‑level replication, multi‑source setups, and plans to add a CRC‑based consistency check in the future.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.