Big Data 11 min read

Sync MySQL to Elasticsearch with Canal: Step‑by‑Step CDC Guide

This tutorial walks you through the fundamentals of MySQL binlog replication, installing and configuring Canal, setting up Elasticsearch, Kibana, and the IK analyzer, and then demonstrates both full and incremental data synchronization from MySQL to Elasticsearch.

Su San Talks Tech

Dec 3, 2023

Sync MySQL to Elasticsearch with Canal: Step‑by‑Step CDC Guide

01 Basics

1.1 Master‑Slave Replication Principle

MySQL master‑slave replication relies on the binlog, which records all changes in binary form on disk.

The binlog data is transferred from the master to the slave, typically asynchronously, meaning the master does not wait for the binlog to be synchronized.

Master writes binlog: update/insert/delete statements are recorded.

Master sends binlog: a dump thread streams the binlog to the slave.

Slave writes relay log: an IO thread receives the binlog and writes it to a relay log file.

Slave replays: a SQL thread reads the relay log and applies the changes, achieving consistency.

1.2 Canal Basics

Canal is a popular data‑sync tool that subscribes to MySQL binlog (CDC – Change Data Capture) by simulating a MySQL slave, then pushes the committed changes downstream.

Canal server receives the dump protocol from MySQL master.

MySQL master pushes binlog to Canal, which parses the binary stream into JSON.

Canal client listens via TCP or MQ and synchronizes data to Elasticsearch.

02 Software Installation

2.1 Java JDK

Official site: https://www.oracle.com/java/technologies/downloads/

Version: 11.0.19

Canal and Elasticsearch both require JDK, so install it first.

2.2 MySQL

Enable MySQL binlog (create or edit my.cnf on macOS and restart MySQL). Then create a user for Canal:

GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'localhost' IDENTIFIED BY 'canal';

2.3 Canal

Official site: https://github.com/alibaba/canal/releases

Version: v1.1.6

Download canal.adapter and canal.deployer. The deployer acts as the server, while the adapter handles downstream adapters (e.g., Elasticsearch, HBase).

2.4 Elasticsearch

Install via Homebrew: brew install elasticsearch Verify with http://localhost:9200/?pretty.

2.5 Kibana

Download and install version 7.14.0, then access http://localhost:5601/app/dev_tools#/console.

2.6 IK Analyzer

Download from the official GitHub release (v7.17.2) and install it as an Elasticsearch plugin to enable Chinese word segmentation.

03 Canal Configuration

3.1 canal.deployer Configuration

Edit conf/example/instance.properties to set the MySQL URL, username, and password (default user/password is canal).

3.2 Start canal.deployer

Run startup.sh under the bin directory. Successful start shows “start successful” and begins listening to MySQL.

3.3 canal.adapter Configuration

Step 1: Comment out all sections in bootstrap.yml to avoid “table not found” errors.

Step 2: Edit application.yml with correct MySQL credentials and Elasticsearch URL (include http:// prefix).

Step 3: Define the target data source in the es7 folder. Create article.yml to map the MySQL article table to an Elasticsearch index.

dataSourceKey: defaultDS
destination: example
groupId: g1
esMapping:
  _index: article
  _id: _id
  sql: "SELECT t.id AS _id, t.id, t.user_id, t.article_type, t.title, t.short_title, t.picture, t.summary, t.category_id, t.source, t.source_url, t.offical_stat, t.topping_stat, t.cream_stat, t.status, t.deleted, t.create_time, t.update_time FROM article t"
  commitBatch: 1

Step 4: In Kibana, create the article index with appropriate mappings (e.g., title, summary using ik_max_word analyzer).

PUT /article
{
  "mappings": {
    "properties": {
      "id": {"type": "integer"},
      "user_id": {"type": "integer"},
      "article_type": {"type": "integer"},
      "title": {"type": "text", "analyzer": "ik_max_word"},
      "short_title": {"type": "text", "analyzer": "ik_max_word"},
      "picture": {"type": "text", "analyzer": "ik_max_word"},
      "summary": {"type": "text", "analyzer": "ik_max_word"},
      "category_id": {"type": "integer"},
      "source": {"type": "integer"},
      "source_url": {"type": "text", "analyzer": "ik_max_word"},
      "offical_stat": {"type": "integer"},
      "topping_stat": {"type": "integer"},
      "cream_stat": {"type": "integer"},
      "status": {"type": "integer"},
      "deleted": {"type": "integer"},
      "create_time": {"type": "date"},
      "update_time": {"type": "date"}
    }
  }
}

3.4 Start canal.adapter

Run the adapter; logs show it listening on port 8081 without errors.

04 Data Synchronization Practice

4.1 Full‑Data Sync

After starting the adapter, trigger a full sync with:

curl http://127.0.0.1:8081/etl/es7/article.yml -X POST

The response indicates the number of records imported (e.g., 10).

4.2 Incremental Sync

When MySQL rows are updated, deleted, or inserted, Canal captures the changes and updates Elasticsearch accordingly. Logs and Kibana query results confirm the incremental updates.

05 Summary

The article covered the entire pipeline: MySQL binlog replication, Canal installation and configuration, Elasticsearch and Kibana setup, and both full‑ and incremental data synchronization, providing a hands‑on reference for similar projects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Elasticsearch MySQL Canal Data synchronization CDC

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.