Quick Guide to Building a Canal‑Based Real‑Time Data Synchronization Platform on CentOS 7
This article walks through the end‑to‑end setup of a small‑scale data platform using Alibaba's Canal for MySQL binlog capture, covering the installation and configuration of MySQL, Zookeeper, Kafka, and Canal itself, and demonstrates real‑time change capture with sample DML operations.
The author needs a lightweight data platform to sync business‑system changes (insert, update, soft‑delete) in near real time, clean the data, and build a model for downstream analytics; Canal is chosen as the core middleware.
Canal Overview
Canal parses MySQL binary logs and provides incremental data subscription. It mimics a MySQL slave, requests a dump from the master, receives binlog events, parses the byte stream, and forwards structured events to a message queue.
Canal Versions and Components
At the time of writing (2020‑03‑05) the stable release is v1.1.4 (with Tcp, Kafka, and RocketMQ connectors). The newer v1.1.5‑alpha‑1 adds a RabbitMQ connector but is less mature. The author recommends using v1.1.4 with the Kafka connector.
Canal consists of three core modules: canal‑admin: Web UI for management. canal‑adapter: Adapters for REST, log, RDB, HBase, ES, etc. canal‑deployer: The deployer that parses binlog and sends messages to connectors.
Required Middleware Deployment
Four components must be deployed on a CentOS 7 VM: MySQL, Zookeeper, Kafka, and Canal.
Install MySQL
Use the official yum repo:
cd /data/mysql
wget https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
sudo rpm -Uvh mysql80-community-release-el7-3.noarch.rpm
sudo yum install mysql-community-serverAfter installation, start the service, retrieve the temporary root password, then set a permanent password and allow remote access:
service mysqld start
cat /var/log/mysqld.log # find temporary password
mysql -u root -p
ALTER USER 'root'@'localhost' IDENTIFIED BY 'QWqw12!@';
UPDATE mysql.user SET host='%' WHERE user='root';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
ALTER USER 'root'@'%' IDENTIFIED WITH mysql_native_password BY 'QWqw12!@';Create a dedicated user for Canal:
CREATE USER canal IDENTIFIED BY 'QWqw12!@';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;
ALTER USER 'canal'@'%' IDENTIFIED WITH mysql_native_password BY 'QWqw12!@';Create a test database:
CREATE DATABASE `test` CHARSET `utf8mb4` COLLATE `utf8mb4_unicode_ci`;Install Zookeeper
Download and extract version 3.6.0, set dataDir=/data/zk/data in conf/zoo.cfg, then start:
mkdir -p /data/zk/data
wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.6.0/apache-zookeeper-3.6.0-bin.tar.gz
tar -zxvf apache-zookeeper-3.6.0-bin.tar.gz
cd apache-zookeeper-3.6.0-bin/conf
cp zoo_sample.cfg zoo.cfg # edit dataDir
sh ../bin/zkServer.sh startInstall Kafka
Download Kafka 2.4.0 built for Scala 2.13, adjust log.dirs to /data/kafka/data, and start with daemon mode:
mkdir -p /data/kafka/data
wget http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.4.0/kafka_2.13-2.4.0.tgz
tar -zxvf kafka_2.13-2.4.0.tgz
# edit config/server.properties if needed
sh kafka_2.13-2.4.0/bin/kafka-server-start.sh -daemon config/server.propertiesInstall and Configure Canal
Download the stable deployer package and extract:
mkdir /data/canal && cd /data/canal
wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.deployer-1.1.4.tar.gz
tar -zxvf canal.deployer-1.1.4.tar.gzKey configuration files are conf/canal.properties and conf/example/instance.properties. Important settings:
Enable parallel parsing: uncomment canal.instance.parser.parallelThreadSize = 16.
Set canal.serverMode = kafka to use the Kafka connector.
Define the Kafka broker: canal.mq.servers = 127.0.0.1:9092.
Specify the topic and partition: canal.mq.topic = test, canal.mq.partition = 0.
In instance.properties configure the MySQL source:
canal.instance.mysql.slaveId = 654321
canal.instance.master.address = 127.0.0.1:3306
canal.instance.dbUsername = canal
canal.instance.dbPassword = QWqw12!@
canal.instance.defaultDatabaseName = testStart Canal
sh /data/canal/bin/startup.sh
# tail logs
tail -f /data/canal/logs/example/example.logAfter the service starts, create a sample table and run DML statements:
USE `test`;
CREATE TABLE `order` (
id BIGINT UNIQUE PRIMARY KEY AUTO_INCREMENT COMMENT '主键',
order_id VARCHAR(64) NOT NULL COMMENT '订单ID',
amount DECIMAL(10,2) NOT NULL DEFAULT 0 COMMENT '订单金额',
create_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
UNIQUE uniq_order_id (order_id)
) COMMENT '订单表';
INSERT INTO `order`(order_id, amount) VALUES ('10086', 999);
UPDATE `order` SET amount = 10087 WHERE order_id = '10086';
DELETE FROM `order` WHERE order_id = '10086';Consume the generated events from Kafka to verify:
sh /data/kafka/kafka_2.13-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --from-beginning --topic testThe console shows JSON messages for the CREATE DATABASE, CREATE TABLE, INSERT, UPDATE, and DELETE events, confirming that Canal successfully captured and forwarded MySQL binlog changes to Kafka.
Conclusion
The guide demonstrates that deploying Canal is straightforward; most of the effort lies in provisioning the supporting middleware. With a few key configuration tweaks, Canal can reliably capture binlog events for downstream ELT pipelines, and a production‑grade HA setup can be built on top of the stable v1.1.4 release.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
