Unlock Real-Time MySQL Data Sync with Alibaba Canal: A Hands‑On Guide
This article introduces Alibaba's open‑source Canal middleware, explains its architecture and high‑availability design, walks through MySQL binlog configuration, Canal setup, and provides a complete Java client example for real‑time data synchronization, cache refresh, and task dispatch scenarios.
Introduction
Canal (meaning "pipeline") is an open‑source middleware that parses MySQL binary logs (binlog) to provide incremental data subscription and consumption.
Originally developed at Alibaba to synchronize data across data centers, Canal now supports many use cases such as database mirroring, real‑time backup, index building, cache refresh, and custom incremental data processing.
Supported MySQL versions: 5.1.x, 5.5.x, 5.6.x, 5.7.x, 8.0.x.
Working Principle
MySQL Master‑Slave Replication
Master writes data changes to the binary log.
Slave copies the binary log events to its relay log.
Slave replays the relay log to apply changes locally.
Canal Operation
Canal mimics a MySQL slave, sending a dump request to the master.
Master pushes binary log data to Canal.
Canal parses the binary log byte stream.
GitHub: https://github.com/alibaba/canal
Wiki: https://github.com/alibaba/canal/wiki
Canal Architecture
One server represents a Canal instance (one JVM). Each instance contains modules:
eventParser – connects to MySQL as a slave and parses the protocol.
eventSink – links parser and store, handling filtering, transformation, and distribution.
eventStore – stores the data.
metaManager – manages subscription and consumption metadata.
Only when an instance is started can it perform synchronization tasks; multiple instances can exist within a server.
Canal HA Mechanism
High availability relies on Zookeeper (watcher and EPHEMERAL nodes). Both Canal server and client use Zookeeper to elect a single active instance.
Server side: only one instance per server can be running; others stay standby.
Client side: only one client can operate on an instance at a time to guarantee order.
Canal server attempts to create an EPHEMERAL node in Zookeeper; the creator becomes the active instance.
If creation succeeds, the instance starts; otherwise it stays standby.
If the EPHEMERAL node disappears, Zookeeper notifies other servers to elect a new active instance.
Clients query Zookeeper for the current active instance before connecting; they reconnect if the link fails.
Canal client uses the same EPHEMERAL node strategy for control.
Application Scenarios
Sync Cache (Redis) / Full‑Text Search (Elasticsearch)
When the database changes, Canal pushes incremental updates to cache or ES; if issues arise, you can roll back the binlog to a previous position and perform a full refresh.
Task Dispatch
Database changes trigger messages to MQ/Kafka, notifying downstream systems (detail page, list page, search page) for precise data propagation.
Data Heterogeneity
In sharded architectures, Canal can be used to aggregate data from multiple tables into a single logical view for complex queries.
MySQL Configuration
Enable Binlog
[root@iZ2zebiempwqvoc2xead5lZ mysql]# find / -name my.cnf
/etc/my.cnf
[root@iZ2zebiempwqvoc2xead5lZ mysql]# cd /etc
[root@iZ2zebiempwqvoc2xead5lZ etc]# vim my.cnfAdd under [mysqld]:
server-id=1 # unique master ID
log_bin=mysql-bin # binlog file path
binlog-format=row # row‑level logging
binlog-do-db=cheetah # monitor this databaseRestart MySQL and verify the creation of mysql-bin.000001 and mysql-bin.index.
Binlog Formats
The binlog_format parameter can be set to statement , row , or mixed :
statement : stores the original SQL statement (e.g., update T set update_time=now() where id=1); may cause time‑drift inconsistencies.
row : stores the actual row data (e.g., update_time=1627112756247); ensures consistency but uses more space and I/O.
mixed : MySQL chooses row or statement based on potential inconsistency.
Configure Permissions
CREATE USER canal IDENTIFIED BY 'XXXX';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;Weak passwords may trigger error 1819 (policy violation).
Canal Configuration
Download the package (e.g., canal.deployer-1.1.6.tar.gz) and extract it. tar -zxvf canal.deployer-1.1.6.tar.gz Key ports in conf/canal.properties:
canal.admin.port = 11110
canal.port = 11111
canal.metrics.pull.port = 11112Set destinations in conf/canal.properties: canal.destinations = example Instance configuration ( conf/example/instance.properties):
# custom slaveId (different from MySQL server-id)
canal.instance.mysql.slaveId=10
# MySQL address
canal.instance.master.address=127.0.0.1:3306
# credentials
canal.instance.dbUsername=xxx
canal.instance.dbPassword=xxx
# charset
canal.instance.connectionCharset=UTF-8
# monitor all databases/tables
canal.instance.filter.regex=.*\..*If the server has only one CPU, set canal.instance.parser.parallel to false.
Start Canal
Run from the installation directory:
sh bin/startup.sh
# or
./bin/startup.shTroubleshooting
If no canal.log appears, check the process list; if absent, look for canal_stdout.log which may indicate out‑of‑memory errors.
Adjust JVM memory parameters in startup.sh, e.g.:
-server -Xms80m -Xmx80m -Xmn80m -XX:SurvivorRatio=2 -XX:PermSize=66m -XX:MaxPermSize=80m -Xss256k -XX:-UseAdaptiveSizePolicy -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryErrorIf a canal.pid file exists, run stop.sh before starting again.
Practical Example
Dependency
<dependency>
<groupId>com.alibaba.otter</groupId>
<artifactId>canal.client</artifactId>
<version>1.1.0</version>
</dependency>Code Sample
The following Java program connects to Canal, fetches binlog entries, and prints changes:
public class SimpleCanalClientExample {
public static void main(String args[]) {
CanalConnector connector = CanalConnectors.newSingleConnector(
new InetSocketAddress("127.0.0.1", 11111), "example", "", "");
int batchSize = 1000;
int emptyCount = 0;
try {
connector.connect();
connector.subscribe(".*\\..*");
connector.rollback();
int totalEmptyCount = 120;
while (emptyCount < totalEmptyCount) {
Message message = connector.getWithoutAck(batchSize);
long batchId = message.getId();
int size = message.getEntries().size();
if (batchId == -1 || size == 0) {
emptyCount++;
System.out.println("empty count : " + emptyCount);
try { Thread.sleep(1000); } catch (InterruptedException e) {}
} else {
emptyCount = 0;
printEntry(message.getEntries());
}
connector.ack(batchId);
}
System.out.println("empty too many times, exit");
} finally {
connector.disconnect();
}
}
private static void printEntry(List<CanalEntry.Entry> entrys) {
for (CanalEntry.Entry entry : entrys) {
if (entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONBEGIN ||
entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONEND) {
continue;
}
CanalEntry.RowChange rowChage = null;
try {
rowChage = CanalEntry.RowChange.parseFrom(entry.getStoreValue());
} catch (Exception e) {
throw new RuntimeException("ERROR ## parser of eromanga-event has an error , data:" + entry.toString(), e);
}
CanalEntry.EventType eventType = rowChage.getEventType();
System.out.println(String.format("================> binlog[%s:%s] , name[%s,%s] , eventType : %s",
entry.getHeader().getLogfileName(), entry.getHeader().getLogfileOffset(),
entry.getHeader().getSchemaName(), entry.getHeader().getTableName(), eventType));
for (CanalEntry.RowData rowData : rowChage.getRowDatasList()) {
if (eventType == CanalEntry.EventType.DELETE) {
printColumn(rowData.getBeforeColumnsList());
} else if (eventType == CanalEntry.EventType.INSERT) {
printColumn(rowData.getAfterColumnsList());
} else {
System.out.println("------- > before");
printColumn(rowData.getBeforeColumnsList());
System.out.println("------- > after");
printColumn(rowData.getAfterColumnsList());
}
}
}
}
private static void printColumn(List<CanalEntry.Column> columns) {
for (CanalEntry.Column column : columns) {
System.out.println(column.getName() + " : " + column.getValue() + " update=" + column.getUpdated());
}
}
}Testing
Run the project and observe logs such as:
empty count : 1
empty count : 2
empty count : 3
empty count : 4After updating a row in MySQL, Canal outputs:
================> binlog[mysql-bin.000002:8377] , name[cheetah,product_info] , eventType : UPDATE
------- > before
id : 3 update=false
name : java开发1 update=false
price : 87.0 update=false
create_date : 2021-03-27 22:43:31 update=false
update_date : 2021-03-27 22:43:34 update=false
------- > after
id : 3 update=false
name : java开发 update=true
price : 87.0 update=false
create_date : 2021-03-27 22:43:31 update=false
update_date : 2021-03-27 22:43:34 update=falseSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
