Sync MySQL to ClickHouse Using QingCloud MySQL Plus & MaterializeMySQL
This article explains how to break the barrier between heterogeneous databases by synchronizing MySQL to ClickHouse using QingCloud MySQL Plus and the MaterializeMySQL engine, covering MySQL replication history, architecture details, HTAP scenarios, and providing practical code examples for setup and operation.
Background
MySQL has been the world’s most popular OLTP database since 2000, while ClickHouse has recently attracted great attention as an OLAP database. This article shows how to synchronize MySQL data to ClickHouse.
1. MySQL Replication Development
MySQL 3.23 (2001) introduced asynchronous replication, which was not suitable for production. MySQL 5.7.2 (2013) added enhanced semi‑sync replication, making it marginally enterprise‑ready. MySQL 5.7.17 (2016) introduced MGR, evolving into a financially‑grade solution.
2. QingCloud MySQL Plus
QingCloud MySQL Plus provides strong consistency and high availability. Its architecture uses a Xenon component based on a Raft‑like algorithm to manage MySQL election and health checks, while data synchronization still relies on Semi‑Sync Replication or MGR, achieving strong consistency, automatic leader election, and cross‑region disaster recovery.
Key features:
Multi‑replica synchronization ensures at least one replica stays fully consistent with the primary, providing financial‑grade strong consistency.
Automatic, decentralized leader election with sub‑second failover.
Cross‑zone deployment for disaster recovery.
3. ClickHouse Synchronizing MySQL Data
To accelerate OLAP queries, QingCloud MySQL Plus leverages ClickHouse for data synchronization.
3.1 ClickHouse Overview
ClickHouse is a column‑oriented DBMS for online analytical processing (OLAP). Developed in 2008 for Yandex Metrica, it became open‑source in 2016 and has seen rapid adoption.
3.2 MySQL Table Engine
The original ClickHouse MySQL Table Engine maps individual MySQL tables, fetching schema from information_schema and pulling data on query execution.
Mapping to MySQL table
Fetch table structure from MySQL
Fetch data from MySQL during query
Limitations: only table‑level mapping and potential network/read pressure on MySQL during queries.
3.3 MySQL Database Engine
The MySQL Database Engine provides database‑level mapping, fetching all table structures from information_schema . It still suffers from the same data‑transfer drawbacks as the Table Engine.
3.4 Third‑Party Software Synchronization
Tools like Canal or Kafka can parse MySQL binlogs and write to ClickHouse, offering controllable sync flows but increasing complexity and operational overhead.
4. MaterializeMySQL Engine
To address the remaining issues, QingCloud ClickHouse team developed the MaterializeMySQL engine, an experimental feature merged into ClickHouse 20.8. It maps MySQL databases and consumes binlog events into ClickHouse’s MergeTree.
4.1 Creating a MaterializeMySQL Database
CREATE DATABASE test ENGINE = MaterializeMySQL('172.17.0.3:3306', 'demo', 'root', '123');Parameters:
172.17.0.3:3306 – MySQL address and port
demo – MySQL database name
root – MySQL sync user
123 – User password
4.2 Design Workflow
Check MySQL variables
Select historical data
Consume new data
4.3 Function Process
CheckMySQLVars validates required MySQL settings (e.g., log_bin=ON, binlog_format=ROW, etc.).
SHOW VARIABLES WHERE (Variable_name = 'log_bin' AND upper(Value) = 'ON')
OR (Variable_name = 'binlog_format' AND upper(Value) = 'ROW')
OR (Variable_name = 'binlog_row_image' AND upper(Value) = 'FULL')
OR (Variable_name = 'default_authentication_plugin' AND upper(Value) = 'MYSQL_NATIVE_PASSWORD')
OR (Variable_name = 'log_bin_use_v1_row_events' AND upper(Value) = 'OFF');prepareSynchronized initializes GTID, cleans outdated tables, pulls historical data, and establishes the binlog channel.
std::optional<MaterializeMetadata> MaterializeMySQLSyncThread::prepareSynchronized() {
connection = pool.get();
MaterializeMetadata metadata(connection, ...);
if (!metadata.need_dumping_tables.empty()) {
Position position;
position.update(metadata.binlog_position, metadata.binlog_file, metadata.executed_gtid_set);
metadata.transaction(position, [&]() {
cleanOutdatedTables(database_name, global_context);
dumpDataForTables(connection, metadata, ...);
});
}
connection->query("COMMIT").execute();
}Synchronized reads binlog events and converts them to ClickHouse statements, flushing buffers to disk.
client.connect();
client.startBinlogDumpGTID(...);
while (!isCancelled()) {
BinlogEventPtr binlog_event = client.readOneBinlogEvent(...);
if (binlog_event) onEvent(buffers, binlog_event, *metadata);
if (!buffers.data.empty()) flushBuffersData(buffers, *metadata);
}5. HTAP Application Scenario
By linking MySQL replication with ClickHouse analysis, MySQL Plus + ClickHouse can realize Hybrid Transactional/Analytical Processing (HTAP). ClickHouse can serve as an analytical instance within the MySQL replication topology, reducing reliance on read‑only MySQL replicas.
Current support includes MySQL 5.7 and 8.0; MySQL 5.6 is not supported. MaterializeMySQL is experimental and welcomes community contributions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Qingyun Technology Community
Official account of the Qingyun Technology Community, focusing on tech innovation, supporting developers, and sharing knowledge. Born to Learn and Share!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
