Big Data 20 min read

Quick Guide to Building a Canal‑Based Real‑Time Data Synchronization Platform on CentOS 7

This article walks through the end‑to‑end setup of a small‑scale data platform using Alibaba's Canal for MySQL binlog capture, covering the installation and configuration of MySQL, Zookeeper, Kafka, and Canal itself, and demonstrates real‑time change capture with sample DML operations.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Quick Guide to Building a Canal‑Based Real‑Time Data Synchronization Platform on CentOS 7

The author needs a lightweight data platform to sync business‑system changes (insert, update, soft‑delete) in near real time, clean the data, and build a model for downstream analytics; Canal is chosen as the core middleware.

Canal Overview

Canal parses MySQL binary logs and provides incremental data subscription. It mimics a MySQL slave, requests a dump from the master, receives binlog events, parses the byte stream, and forwards structured events to a message queue.

Canal Versions and Components

At the time of writing (2020‑03‑05) the stable release is v1.1.4 (with Tcp, Kafka, and RocketMQ connectors). The newer v1.1.5‑alpha‑1 adds a RabbitMQ connector but is less mature. The author recommends using v1.1.4 with the Kafka connector.

Canal consists of three core modules: canal‑admin: Web UI for management. canal‑adapter: Adapters for REST, log, RDB, HBase, ES, etc. canal‑deployer: The deployer that parses binlog and sends messages to connectors.

Required Middleware Deployment

Four components must be deployed on a CentOS 7 VM: MySQL, Zookeeper, Kafka, and Canal.

Install MySQL

Use the official yum repo:

cd /data/mysql
wget https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
sudo rpm -Uvh mysql80-community-release-el7-3.noarch.rpm
sudo yum install mysql-community-server

After installation, start the service, retrieve the temporary root password, then set a permanent password and allow remote access:

service mysqld start
cat /var/log/mysqld.log   # find temporary password
mysql -u root -p
ALTER USER 'root'@'localhost' IDENTIFIED BY 'QWqw12!@';
UPDATE mysql.user SET host='%' WHERE user='root';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
ALTER USER 'root'@'%' IDENTIFIED WITH mysql_native_password BY 'QWqw12!@';

Create a dedicated user for Canal:

CREATE USER canal IDENTIFIED BY 'QWqw12!@';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;
ALTER USER 'canal'@'%' IDENTIFIED WITH mysql_native_password BY 'QWqw12!@';

Create a test database:

CREATE DATABASE `test` CHARSET `utf8mb4` COLLATE `utf8mb4_unicode_ci`;

Install Zookeeper

Download and extract version 3.6.0, set dataDir=/data/zk/data in conf/zoo.cfg, then start:

mkdir -p /data/zk/data
wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.6.0/apache-zookeeper-3.6.0-bin.tar.gz
tar -zxvf apache-zookeeper-3.6.0-bin.tar.gz
cd apache-zookeeper-3.6.0-bin/conf
cp zoo_sample.cfg zoo.cfg   # edit dataDir
sh ../bin/zkServer.sh start

Install Kafka

Download Kafka 2.4.0 built for Scala 2.13, adjust log.dirs to /data/kafka/data, and start with daemon mode:

mkdir -p /data/kafka/data
wget http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.4.0/kafka_2.13-2.4.0.tgz
tar -zxvf kafka_2.13-2.4.0.tgz
# edit config/server.properties if needed
sh kafka_2.13-2.4.0/bin/kafka-server-start.sh -daemon config/server.properties

Install and Configure Canal

Download the stable deployer package and extract:

mkdir /data/canal && cd /data/canal
wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.deployer-1.1.4.tar.gz
tar -zxvf canal.deployer-1.1.4.tar.gz

Key configuration files are conf/canal.properties and conf/example/instance.properties. Important settings:

Enable parallel parsing: uncomment canal.instance.parser.parallelThreadSize = 16.

Set canal.serverMode = kafka to use the Kafka connector.

Define the Kafka broker: canal.mq.servers = 127.0.0.1:9092.

Specify the topic and partition: canal.mq.topic = test, canal.mq.partition = 0.

In instance.properties configure the MySQL source:

canal.instance.mysql.slaveId = 654321
canal.instance.master.address = 127.0.0.1:3306
canal.instance.dbUsername = canal
canal.instance.dbPassword = QWqw12!@
canal.instance.defaultDatabaseName = test

Start Canal

sh /data/canal/bin/startup.sh
# tail logs
tail -f /data/canal/logs/example/example.log

After the service starts, create a sample table and run DML statements:

USE `test`;
CREATE TABLE `order` (
  id BIGINT UNIQUE PRIMARY KEY AUTO_INCREMENT COMMENT '主键',
  order_id VARCHAR(64) NOT NULL COMMENT '订单ID',
  amount DECIMAL(10,2) NOT NULL DEFAULT 0 COMMENT '订单金额',
  create_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  UNIQUE uniq_order_id (order_id)
) COMMENT '订单表';
INSERT INTO `order`(order_id, amount) VALUES ('10086', 999);
UPDATE `order` SET amount = 10087 WHERE order_id = '10086';
DELETE FROM `order` WHERE order_id = '10086';

Consume the generated events from Kafka to verify:

sh /data/kafka/kafka_2.13-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --from-beginning --topic test

The console shows JSON messages for the CREATE DATABASE, CREATE TABLE, INSERT, UPDATE, and DELETE events, confirming that Canal successfully captured and forwarded MySQL binlog changes to Kafka.

Conclusion

The guide demonstrates that deploying Canal is straightforward; most of the effort lies in provisioning the supporting middleware. With a few key configuration tweaks, Canal can reliably capture binlog events for downstream ELT pipelines, and a production‑grade HA setup can be built on top of the stable v1.1.4 release.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataZooKeeperKafkamysqlCanaldata synchronizationCentOS
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.