Databases 11 min read

How ClickHouse Replicates MySQL in Real-Time: A Step‑by‑Step Guide

This article explains how ClickHouse can act as a MySQL replica, covering full and incremental synchronization, supported MySQL versions, DDL compatibility, handling of DELETE/UPDATE events, the underlying binlog mechanism, and provides complete code examples to set up both MySQL master and ClickHouse slave.

Efficient Ops
Efficient Ops
Efficient Ops
How ClickHouse Replicates MySQL in Real-Time: A Step‑by‑Step Guide

Overview

ClickHouse can be mounted as a MySQL replica, providing full‑load then incremental real‑time synchronization of MySQL data. The feature supports MySQL 5.6, 5.7 and 8.0, handles

DELETE

,

UPDATE

, and most common DDL operations. It is currently in Alpha stage and relies on community feedback for rapid iteration.

Code Acquisition

Since the feature is still under validation, the code is fetched from a GitHub pull request:

<code>git fetch origin pull/10851/head:mysql_replica_experiment</code>

MySQL Master Setup

Start a MySQL container with binary logging enabled:

<code>docker run -d -e MYSQL_ROOT_PASSWORD=123 mysql:5.7 mysqld --datadir=/var/lib/mysql --server-id=1 --log-bin=/var/lib/mysql/mysql-bin.log --gtid-mode=ON --enforce-gtid-consistency</code>

Create a database and a table, then insert sample data:

<code>mysql> create database ckdb;</code>
<code>mysql> use ckdb;</code>
<code>mysql> create table t1(a int not null primary key, b int);</code>
<code>mysql> insert into t1 values(1,1),(2,2);</code>
<code>mysql> select * from t1;</code>

ClickHouse Slave Setup

Create a replication channel using the

MaterializeMySQL

engine:

<code>CREATE DATABASE ckdb ENGINE = MaterializeMySQL('172.17.0.2:3306', 'ckdb', 'root', '123');</code>
<code>use ckdb;</code>
<code>show tables;</code>
<code>select * from t1;</code>

Check the synchronization metadata:

<code>cat ckdatas/metadata/ckdb/.metadata</code>
<code>Version:1</code>
<code>Binlog File:mysql-bin.000001</code>
<code>Binlog Position:913</code>
<code>Data Version:0</code>

DELETE Propagation

Execute a delete on the MySQL master:

<code>mysql> delete from t1 where a=1;</code>

Query the ClickHouse replica and observe the row removal:

<code>clickhouse :) select * from t1;</code>
<code>┌─a─┬─b─┐</code>
<code>│ 2 │ 2 │</code>
<code>└───┴───┘</code>

The metadata now shows

Data Version:2

.

UPDATE Propagation

Run an update on the MySQL master:

<code>mysql> update t1 set b=b+1;</code>

Query the ClickHouse replica and see the updated value:

<code>clickhouse :) select * from t1;</code>
<code>┌─a─┬─b─┐</code>
<code>│ 2 │ 3 │</code>
<code>└───┴───┘</code>

Implementation Mechanism

ClickHouse consumes MySQL binlog events. The main event types are:

MYSQL_QUERY_EVENT – DDL

MYSQL_WRITE_ROWS_EVENT – INSERT

MYSQL_UPDATE_ROWS_EVENT – UPDATE

MYSQL_DELETE_ROWS_EVENT – DELETE

After a transaction commits, MySQL writes the corresponding events to the binlog. By implementing the MySQL Replication Protocol, ClickHouse can stream these events directly.

The three major challenges are DDL compatibility, DELETE/UPDATE support, and query filtering.

DDL Handling

When a MySQL table is replicated, ClickHouse creates an

ATTACH TABLE

with hidden columns

_sign

(‑1 for delete, 1 for insert) and

_version

. The engine used is

ReplacingMergeTree

with

_version

as the version column, and the original primary key becomes the sorting and partition key.

<code>ATTACH TABLE t1 (
    `a` Int32,
    `b` Nullable(Int32),
    `_sign` Int8,
    `_version` UInt64
) ENGINE = ReplacingMergeTree(_version)
PARTITION BY intDiv(a, 4294967)
ORDER BY tuple(a)
SETTINGS index_granularity = 8192;</code>

DELETE and UPDATE

Each change creates a new part in ClickHouse:

Part 1 – initial INSERT produces rows with

_sign = 1

and

_version = 1

.

Part 2 – DELETE generates a row with the same primary key,

_sign = -1

and an incremented

_version

.

Part 3 – UPDATE creates a new row with the updated values,

_sign = 1

and a further incremented

_version

.

Querying with the

FINAL

modifier lets ClickHouse deduplicate based on

_version

and

_sign

:

<code>clickhouse :) select a,b,_sign,_version from t1 final;</code>
<code>┌─a─┬─b─┬─_sign─┬─_version─┐</code>
<code>│ 1 │ 1 │    -1 │        2 │</code>
<code>│ 2 │ 3 │     1 │        3 │</code>
<code>└───┴───┴───────┴──────────┘</code>

Query Filtering

The

MaterializeMySQL

engine treats rows with

_sign = -1

as deleted and automatically filters them out in normal queries.

Conclusion

Real‑time MySQL replication in ClickHouse (pull request #10851) bridges the gap between OLTP and OLAP by consuming binlog events directly, converting them to ClickHouse blocks, and writing them to the storage engine with minimal latency. It supports database‑level replication, multi‑source setups, and plans to add a CRC‑based consistency check in the future.

ClickHousebinlogMySQL replicationReplacingMergeTreeMaterializeMySQLRealtime sync
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.