Backend Development 6 min read

Boost Data Sync Speed 8‑10×: Integrating Alibaba DataX into Spring Boot

This article explains how to replace a slow Kettle‑based ETL process with Alibaba DataX, covering environment setup, compilation, Maven integration, Java invocation, and performance results that show a ten‑fold speed increase for syncing over a million records.

Java High-Performance Architecture

Mar 29, 2023

Boost Data Sync Speed 8‑10×: Integrating Alibaba DataX into Spring Boot

Our system stores base data in EBS and originally used a scheduled plus manual sync via Kettle scripts, which became slow as data grew to over a million rows, taking more than ten minutes per run.

Because Kettle was simple but not scalable, we switched to Alibaba's open‑source DataX for faster ETL.

Development environment: Windows 10, JDK 1.8, Maven 3.6.3; runtime on Linux 3.10 with Python 2.7.5.

We cloned DataX from GitHub and built it with

mvn -U clean package assembly:assembly -Dmaven.test.skip=true

. The resulting package (~1.32 GB) was uploaded to the server.

Using the provided Python script we generated a job template (e.g.,

python datax.py -r oraclereader -w oraclewriter > oracleetl.json

) and customized it for our Oracle‑to‑Oracle sync.

In the Spring Boot project we added the required DataX JARs as system‑scoped dependencies:

<dependency>
    <groupId>com.alibaba.datax</groupId>
    <artifactId>data-common</artifactId>
    <version>0.01-SNAPSHOT</version>
    <scope>system</scope>
    <systemPath>${project.basedir}/src/main/webapp/WEB-INF/lib/datax-common-0.0.1-SNAPSHOT.jar</systemPath>
</dependency>
... (other dependencies)

Because DataX artifacts are not in public Maven repositories, we referenced the local JARs directly.

We invoked DataX from Java code:

System.setProperty("datax.home", DATAX_HOME);
String[] args = {"-job", DATAX_HOME+"/etl_itemdata.json", "-mode", "standalone", "-jobid", "-1"};
Engine.entry(args);

Running the job produced the following log, showing that 1,335,288 records were synchronized in 60 seconds, achieving an average throughput of 2.33 MB/s and 22,254 records/s.

Compared with Kettle, the sync speed increased by 8‑10×, meeting business requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java spring-boot DataX ETL Kettle

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.