Backend Development 13 min read

Using DataX for MySQL Data Synchronization: Full and Incremental Sync Guide

This article explains how to install DataX, configure MySQL readers and writers, and execute both full and incremental data synchronization jobs between two MySQL instances, providing step‑by‑step commands, JSON job templates, and troubleshooting tips for large‑scale data transfers.

Selected Java Interview Questions
Selected Java Interview Questions
Selected Java Interview Questions
Using DataX for MySQL Data Synchronization: Full and Incremental Sync Guide

Introduction

The author needed to synchronize 50 million rows between two MySQL databases where traditional SQL sync was impractical, and discovered that DataX offers fast, reliable offline data transfer.

What is DataX?

DataX is an open‑source version of Alibaba Cloud DataWorks for offline data integration, supporting heterogeneous sources such as MySQL, Oracle, HDFS, Hive, ODPS, HBase, and FTP. It abstracts source and target connections as Reader and Writer plugins within a framework‑plus‑plugin architecture.

DataX 3.0 Framework Design

DataX uses a Framework + Plugin model where each data source is implemented as a Reader plugin and each destination as a Writer plugin.

DataX Core Architecture

A Job represents a single synchronization task. The Job creates multiple Task instances, which are grouped into TaskGroup s for concurrent execution. The scheduler distributes tasks based on the configured channel count.

Installation Prerequisites

JDK 1.8+

Python 2 or 3

Apache Maven 3.x (only for compiling DataX from source)

Installing JDK and DataX

# Install JDK
[root@MySQL-1 ~]# tar zxf jdk-8u181-linux-x64.tar.gz
[root@MySQL-1 ~]# mv jdk1.8.0_181 /usr/local/java
# Set environment variables
[root@MySQL-1 ~]# cat >> /etc/profile <

Using DataX for Data Synchronization

1. Full Synchronization

Create a JSON job file that defines a mysqlreader and mysqlwriter . Example snippet:

{
  "job": {
    "content": [{
      "reader": {
        "name": "mysqlreader",
        "parameter": {
          "username": "root",
          "password": "123123",
          "column": ["*"],
          "splitPk": "ID",
          "connection": [{
            "jdbcUrl": ["jdbc:mysql://192.168.1.1:3306/course-study?useUnicode=true&characterEncoding=utf8"],
            "table": ["t_member"]
          }]
        }
      },
      "writer": {
        "name": "mysqlwriter",
        "parameter": {
          "username": "root",
          "password": "123123",
          "column": ["*"],
          "connection": [{
            "jdbcUrl": "jdbc:mysql://192.168.1.2:3306/course-study?useUnicode=true&characterEncoding=utf8",
            "table": ["t_member"]
          }],
          "preSql": ["truncate t_member"],
          "session": ["set session sql_mode='ANSI'"],
          "writeMode": "insert"
        }
      }
    }],
    "setting": {"speed": {"channel": "5"}}
  }
}

Run the job with:

# python /usr/local/datax/bin/datax.py full_sync.json

The output shows total records, speed, and completion time.

2. Incremental Synchronization

Incremental sync differs only by adding a where clause to the reader configuration. Example:

{
  "job": {
    "content": [{
      "reader": {
        "name": "mysqlreader",
        "parameter": {
          "username": "root",
          "password": "123123",
          "column": ["*"],
          "splitPk": "ID",
          "where": "ID <= 1888",
          "connection": [{
            "jdbcUrl": ["jdbc:mysql://192.168.1.1:3306/course-study?useUnicode=true&characterEncoding=utf8"],
            "table": ["t_member"]
          }]
        }
      },
      "writer": {
        "name": "mysqlwriter",
        "parameter": {
          "username": "root",
          "password": "123123",
          "column": ["*"],
          "connection": [{
            "jdbcUrl": "jdbc:mysql://192.168.1.2:3306/course-study?useUnicode=true&characterEncoding=utf8",
            "table": ["t_member"]
          }],
          "preSql": ["truncate t_member"],
          "session": ["set session sql_mode='ANSI'"],
          "writeMode": "insert"
        }
      }
    }],
    "setting": {"speed": {"channel": "5"}}
  }
}

Execute the job similarly; the log reports the number of records transferred based on the filter.

Key Takeaways

DataX converts complex data pipelines into simple star‑topology jobs.

Full sync is suitable for initial loads; incremental sync uses where clauses to transfer only changed data.

Properly configure MySQL privileges and remove hidden plugin files to avoid runtime errors.

backendMySQLData SynchronizationDataXETL
Selected Java Interview Questions
Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.