Databases 10 min read

How to Sync MySQL to ClickHouse in Real‑Time Using MaterializeMySQL

This article demonstrates how to configure ClickHouse as a MySQL replica, showing step‑by‑step setup of a MySQL master, ClickHouse slave, real‑time binlog consumption, and handling of DDL, UPDATE, and DELETE events for seamless OLTP‑OLAP integration.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Sync MySQL to ClickHouse in Real‑Time Using MaterializeMySQL

Overview

ClickHouse can be mounted as a MySQL replica, providing full‑load then incremental real‑time synchronization of MySQL data. The feature supports MySQL 5.6/5.7/8.0, compatible with DELETE/UPDATE statements and most common DDL operations. It is currently in Alpha and relies on community feedback for rapid iteration.

Code Acquisition

Since the feature is still under validation, the source code is pulled from a GitHub pull request:

git fetch origin pull/10851/head:mysql_replica_experiment

After fetching, the code is compiled locally.

MySQL Master Setup

Run a MySQL container with binary logging enabled:

docker run -d -e MYSQL_ROOT_PASSWORD=123 mysql:5.7 \
  mysqld --datadir=/var/lib/mysql \
  --server-id=1 \
  --log-bin=/var/lib/mysql/mysql-bin.log \
  --gtid-mode=ON \
  --enforce-gtid-consistency

Create a database and a table, then insert sample data:

mysql> create database ckdb;</code><code>mysql> use ckdb;</code><code>mysql> create table t1(a int not null primary key, b int);</code><code>mysql> insert into t1 values(1,1),(2,2);</code><code>mysql> select * from t1;

ClickHouse Slave Setup

Create a replication channel that materializes the MySQL database:

CREATE DATABASE ckdb ENGINE = MaterializeMySQL('172.17.0.2:3306', 'ckdb', 'root', '123');</code><code>use ckdb;</code><code>show tables;</code><code>select * from t1;

Check the replication metadata:

cat ckdatas/metadata/ckdb/.metadata
Version:1</code><code>Binlog File:mysql-bin.000001</code><code>Binlog Position:913</code><code>Data Version:0

Delete Operation

Execute a DELETE on the MySQL master and observe the change on the ClickHouse replica: mysql> delete from t1 where a=1; ClickHouse now shows only the remaining row, and the metadata version increments to 2.

cat ckdatas/metadata/ckdb/.metadata
Version:1</code><code>Binlog File:mysql-bin.000001</code><code>Binlog Position:1171</code><code>Data Version:2

Update Operation

Run an UPDATE on the MySQL master and verify the updated values on ClickHouse:

mysql> update t1 set b=b+1;
clickhouse :) select * from t1;

The updated row appears with the new value.

Implementation Mechanism

Understanding MySQL binlog events is essential. The main event types are:

MYSQL_QUERY_EVENT – DDL

MYSQL_WRITE_ROWS_EVENT – INSERT

MYSQL_UPDATE_ROWS_EVENT – UPDATE

MYSQL_DELETE_ROWS_EVENT – DELETE

ClickHouse consumes these events via the MySQL Replication Protocol, converting them into internal blocks and writing directly to the storage engine. Three major challenges are DDL compatibility, DELETE/UPDATE support, and query filtering.

DDL Handling

When a MySQL table is replicated, ClickHouse creates a corresponding table with hidden columns _sign (‑1 for delete, 1 for insert) and _version. The engine used is ReplacingMergeTree(_version), with the primary key column serving as the sorting and partition key.

ATTACH TABLE t1 (</code><code>    a Int32,</code><code>    b Nullable(Int32),</code><code>    _sign Int8,</code><code>    _version UInt64</code><code>) ENGINE = ReplacingMergeTree(_version)</code><code>PARTITION BY intDiv(a, 4294967)</code><code>ORDER BY tuple(a)

Update and Delete Representation

Each operation generates a separate part in ClickHouse:

Part 1 – initial INSERT produces rows with _sign=1 and _version=1.

Part 2 – DELETE creates a row with the same primary key, _sign=-1, and an incremented version.

Part 3 – UPDATE inserts a new row with the updated values, _sign=1, and a further version number.

Using the FINAL modifier removes obsolete rows based on _version and _sign, yielding the current state.

Query Filtering

During reads, the MaterializeMySQL engine filters out rows where _sign = -1, effectively hiding deleted records.

Conclusion

The real‑time MySQL‑to‑ClickHouse replication (pull request #10851) implements an internal binlog consumer that writes directly to ClickHouse’s storage engine, offering high efficiency compared to external binlog tools. It supports database‑level replication, multi‑source sync, and plans to add a CRC function for data‑consistency verification.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Real-TimedatabaseClickHouseReplicationMaterializeMySQL
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.