Backend Development 13 min read

Using DataX for MySQL Data Synchronization: Full and Incremental Sync Guide

This article explains how to install DataX, configure MySQL readers and writers, and execute both full and incremental data synchronization jobs between two MySQL instances, providing step‑by‑step commands, JSON job templates, and troubleshooting tips for large‑scale data transfers.

Selected Java Interview Questions

Jan 29, 2023

Using DataX for MySQL Data Synchronization: Full and Incremental Sync Guide

Introduction

The author needed to synchronize 50 million rows between two MySQL databases where traditional SQL sync was impractical, and discovered that DataX offers fast, reliable offline data transfer.

What is DataX?

DataX is an open‑source version of Alibaba Cloud DataWorks for offline data integration, supporting heterogeneous sources such as MySQL, Oracle, HDFS, Hive, ODPS, HBase, and FTP. It abstracts source and target connections as Reader and Writer plugins within a framework‑plus‑plugin architecture.

DataX 3.0 Framework Design

DataX uses a Framework + Plugin model where each data source is implemented as a Reader plugin and each destination as a Writer plugin.

DataX Core Architecture

A Job represents a single synchronization task. The Job creates multiple Task instances, which are grouped into TaskGroup s for concurrent execution. The scheduler distributes tasks based on the configured channel count.

Installation Prerequisites

JDK 1.8+

Python 2 or 3

Apache Maven 3.x (only for compiling DataX from source)

Installing JDK and DataX

# Install JDK
[root@MySQL-1 ~]# tar zxf jdk-8u181-linux-x64.tar.gz
[root@MySQL-1 ~]# mv jdk1.8.0_181 /usr/local/java
# Set environment variables
[root@MySQL-1 ~]# cat >> /etc/profile <<END
export JAVA_HOME=/usr/local/java
export PATH=$PATH:$JAVA_HOME/bin
END
[root@MySQL-1 ~]# source /etc/profile
# Download and extract DataX
[root@MySQL-1 ~]# wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
[root@MySQL-1 ~]# tar zxf datax.tar.gz -C /usr/local/
[root@MySQL-1 ~]# rm -rf /usr/local/datax/plugin/*/._*

Using DataX for Data Synchronization

1. Full Synchronization

Create a JSON job file that defines a mysqlreader and mysqlwriter. Example snippet:

{
  "job": {
    "content": [{
      "reader": {
        "name": "mysqlreader",
        "parameter": {
          "username": "root",
          "password": "123123",
          "column": ["*"],
          "splitPk": "ID",
          "connection": [{
            "jdbcUrl": ["jdbc:mysql://192.168.1.1:3306/course-study?useUnicode=true&characterEncoding=utf8"],
            "table": ["t_member"]
          }]
        }
      },
      "writer": {
        "name": "mysqlwriter",
        "parameter": {
          "username": "root",
          "password": "123123",
          "column": ["*"],
          "connection": [{
            "jdbcUrl": "jdbc:mysql://192.168.1.2:3306/course-study?useUnicode=true&characterEncoding=utf8",
            "table": ["t_member"]
          }],
          "preSql": ["truncate t_member"],
          "session": ["set session sql_mode='ANSI'"],
          "writeMode": "insert"
        }
      }
    }],
    "setting": {"speed": {"channel": "5"}}
  }
}

Run the job with:

# python /usr/local/datax/bin/datax.py full_sync.json

The output shows total records, speed, and completion time.

2. Incremental Synchronization

Incremental sync differs only by adding a where clause to the reader configuration. Example:

{
  "job": {
    "content": [{
      "reader": {
        "name": "mysqlreader",
        "parameter": {
          "username": "root",
          "password": "123123",
          "column": ["*"],
          "splitPk": "ID",
          "where": "ID <= 1888",
          "connection": [{
            "jdbcUrl": ["jdbc:mysql://192.168.1.1:3306/course-study?useUnicode=true&characterEncoding=utf8"],
            "table": ["t_member"]
          }]
        }
      },
      "writer": {
        "name": "mysqlwriter",
        "parameter": {
          "username": "root",
          "password": "123123",
          "column": ["*"],
          "connection": [{
            "jdbcUrl": "jdbc:mysql://192.168.1.2:3306/course-study?useUnicode=true&characterEncoding=utf8",
            "table": ["t_member"]
          }],
          "preSql": ["truncate t_member"],
          "session": ["set session sql_mode='ANSI'"],
          "writeMode": "insert"
        }
      }
    }],
    "setting": {"speed": {"channel": "5"}}
  }
}

Execute the job similarly; the log reports the number of records transferred based on the filter.

Key Takeaways

DataX converts complex data pipelines into simple star‑topology jobs.

Full sync is suitable for initial loads; incremental sync uses where clauses to transfer only changed data.

Properly configure MySQL privileges and remove hidden plugin files to avoid runtime errors.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MySQL Data synchronization DataX ETL

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.