Unlock Real-Time MySQL Data Sync with Alibaba Canal: A Hands‑On Guide

This article introduces Alibaba's open‑source Canal middleware, explains its architecture and high‑availability design, walks through MySQL binlog configuration, Canal setup, and provides a complete Java client example for real‑time data synchronization, cache refresh, and task dispatch scenarios.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
Unlock Real-Time MySQL Data Sync with Alibaba Canal: A Hands‑On Guide

Introduction

Canal (meaning "pipeline") is an open‑source middleware that parses MySQL binary logs (binlog) to provide incremental data subscription and consumption.

Originally developed at Alibaba to synchronize data across data centers, Canal now supports many use cases such as database mirroring, real‑time backup, index building, cache refresh, and custom incremental data processing.

Supported MySQL versions: 5.1.x, 5.5.x, 5.6.x, 5.7.x, 8.0.x.

Working Principle

MySQL Master‑Slave Replication

Master writes data changes to the binary log.

Slave copies the binary log events to its relay log.

Slave replays the relay log to apply changes locally.

Canal Operation

Canal mimics a MySQL slave, sending a dump request to the master.

Master pushes binary log data to Canal.

Canal parses the binary log byte stream.

GitHub: https://github.com/alibaba/canal

Wiki: https://github.com/alibaba/canal/wiki

Canal Architecture

One server represents a Canal instance (one JVM). Each instance contains modules:

eventParser – connects to MySQL as a slave and parses the protocol.

eventSink – links parser and store, handling filtering, transformation, and distribution.

eventStore – stores the data.

metaManager – manages subscription and consumption metadata.

Only when an instance is started can it perform synchronization tasks; multiple instances can exist within a server.

Canal HA Mechanism

High availability relies on Zookeeper (watcher and EPHEMERAL nodes). Both Canal server and client use Zookeeper to elect a single active instance.

Server side: only one instance per server can be running; others stay standby.

Client side: only one client can operate on an instance at a time to guarantee order.

Canal server attempts to create an EPHEMERAL node in Zookeeper; the creator becomes the active instance.

If creation succeeds, the instance starts; otherwise it stays standby.

If the EPHEMERAL node disappears, Zookeeper notifies other servers to elect a new active instance.

Clients query Zookeeper for the current active instance before connecting; they reconnect if the link fails.

Canal client uses the same EPHEMERAL node strategy for control.

Application Scenarios

Sync Cache (Redis) / Full‑Text Search (Elasticsearch)

When the database changes, Canal pushes incremental updates to cache or ES; if issues arise, you can roll back the binlog to a previous position and perform a full refresh.

Task Dispatch

Database changes trigger messages to MQ/Kafka, notifying downstream systems (detail page, list page, search page) for precise data propagation.

Data Heterogeneity

In sharded architectures, Canal can be used to aggregate data from multiple tables into a single logical view for complex queries.

MySQL Configuration

Enable Binlog

[root@iZ2zebiempwqvoc2xead5lZ mysql]# find / -name my.cnf
/etc/my.cnf
[root@iZ2zebiempwqvoc2xead5lZ mysql]# cd /etc
[root@iZ2zebiempwqvoc2xead5lZ etc]# vim my.cnf

Add under [mysqld]:

server-id=1                # unique master ID
log_bin=mysql-bin          # binlog file path
binlog-format=row         # row‑level logging
binlog-do-db=cheetah      # monitor this database

Restart MySQL and verify the creation of mysql-bin.000001 and mysql-bin.index.

Binlog Formats

The binlog_format parameter can be set to statement , row , or mixed :

statement : stores the original SQL statement (e.g., update T set update_time=now() where id=1); may cause time‑drift inconsistencies.

row : stores the actual row data (e.g., update_time=1627112756247); ensures consistency but uses more space and I/O.

mixed : MySQL chooses row or statement based on potential inconsistency.

Configure Permissions

CREATE USER canal IDENTIFIED BY 'XXXX';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;
Weak passwords may trigger error 1819 (policy violation).

Canal Configuration

Download the package (e.g., canal.deployer-1.1.6.tar.gz) and extract it. tar -zxvf canal.deployer-1.1.6.tar.gz Key ports in conf/canal.properties:

canal.admin.port = 11110
canal.port = 11111
canal.metrics.pull.port = 11112

Set destinations in conf/canal.properties: canal.destinations = example Instance configuration ( conf/example/instance.properties):

# custom slaveId (different from MySQL server-id)
canal.instance.mysql.slaveId=10
# MySQL address
canal.instance.master.address=127.0.0.1:3306
# credentials
canal.instance.dbUsername=xxx
canal.instance.dbPassword=xxx
# charset
canal.instance.connectionCharset=UTF-8
# monitor all databases/tables
canal.instance.filter.regex=.*\..*
If the server has only one CPU, set canal.instance.parser.parallel to false.

Start Canal

Run from the installation directory:

sh bin/startup.sh
# or
./bin/startup.sh

Troubleshooting

If no canal.log appears, check the process list; if absent, look for canal_stdout.log which may indicate out‑of‑memory errors.

Adjust JVM memory parameters in startup.sh, e.g.:

-server -Xms80m -Xmx80m -Xmn80m -XX:SurvivorRatio=2 -XX:PermSize=66m -XX:MaxPermSize=80m -Xss256k -XX:-UseAdaptiveSizePolicy -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError

If a canal.pid file exists, run stop.sh before starting again.

Practical Example

Dependency

<dependency>
    <groupId>com.alibaba.otter</groupId>
    <artifactId>canal.client</artifactId>
    <version>1.1.0</version>
</dependency>

Code Sample

The following Java program connects to Canal, fetches binlog entries, and prints changes:

public class SimpleCanalClientExample {
    public static void main(String args[]) {
        CanalConnector connector = CanalConnectors.newSingleConnector(
            new InetSocketAddress("127.0.0.1", 11111), "example", "", "");
        int batchSize = 1000;
        int emptyCount = 0;
        try {
            connector.connect();
            connector.subscribe(".*\\..*");
            connector.rollback();
            int totalEmptyCount = 120;
            while (emptyCount < totalEmptyCount) {
                Message message = connector.getWithoutAck(batchSize);
                long batchId = message.getId();
                int size = message.getEntries().size();
                if (batchId == -1 || size == 0) {
                    emptyCount++;
                    System.out.println("empty count : " + emptyCount);
                    try { Thread.sleep(1000); } catch (InterruptedException e) {}
                } else {
                    emptyCount = 0;
                    printEntry(message.getEntries());
                }
                connector.ack(batchId);
            }
            System.out.println("empty too many times, exit");
        } finally {
            connector.disconnect();
        }
    }

    private static void printEntry(List<CanalEntry.Entry> entrys) {
        for (CanalEntry.Entry entry : entrys) {
            if (entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONBEGIN ||
                entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONEND) {
                continue;
            }
            CanalEntry.RowChange rowChage = null;
            try {
                rowChage = CanalEntry.RowChange.parseFrom(entry.getStoreValue());
            } catch (Exception e) {
                throw new RuntimeException("ERROR ## parser of eromanga-event has an error , data:" + entry.toString(), e);
            }
            CanalEntry.EventType eventType = rowChage.getEventType();
            System.out.println(String.format("================> binlog[%s:%s] , name[%s,%s] , eventType : %s",
                entry.getHeader().getLogfileName(), entry.getHeader().getLogfileOffset(),
                entry.getHeader().getSchemaName(), entry.getHeader().getTableName(), eventType));
            for (CanalEntry.RowData rowData : rowChage.getRowDatasList()) {
                if (eventType == CanalEntry.EventType.DELETE) {
                    printColumn(rowData.getBeforeColumnsList());
                } else if (eventType == CanalEntry.EventType.INSERT) {
                    printColumn(rowData.getAfterColumnsList());
                } else {
                    System.out.println("------- > before");
                    printColumn(rowData.getBeforeColumnsList());
                    System.out.println("------- > after");
                    printColumn(rowData.getAfterColumnsList());
                }
            }
        }
    }

    private static void printColumn(List<CanalEntry.Column> columns) {
        for (CanalEntry.Column column : columns) {
            System.out.println(column.getName() + " : " + column.getValue() + "    update=" + column.getUpdated());
        }
    }
}

Testing

Run the project and observe logs such as:

empty count : 1
empty count : 2
empty count : 3
empty count : 4

After updating a row in MySQL, Canal outputs:

================> binlog[mysql-bin.000002:8377] , name[cheetah,product_info] , eventType : UPDATE
------- > before
id : 3    update=false
name : java开发1    update=false
price : 87.0    update=false
create_date : 2021-03-27 22:43:31    update=false
update_date : 2021-03-27 22:43:34    update=false
------- > after
id : 3    update=false
name : java开发    update=true
price : 87.0    update=false
create_date : 2021-03-27 22:43:31    update=false
update_date : 2021-03-27 22:43:34    update=false
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendJavamysqlBinlogCanalDataSync
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.