Backend Development 22 min read

Building a Scalable MySQL Data Migration Service with Microservices and Design Patterns

This article presents a comprehensive guide to designing and implementing a Java‑based MySQL large‑table migration tool using microservice architecture, multiple design patterns, multithreading, custom logging, and flexible migration modes, complete with code snippets, repository links, and deployment considerations.

dbaplus Community

Jun 16, 2024

Building a Scalable MySQL Data Migration Service with Microservices and Design Patterns

Historical Data Migration

The project provides a microservice‑based solution for migrating large MySQL tables, employing design patterns such as Singleton, Bridge, Factory, Template, and Strategy, and core technologies like multithreading and filters. It supports several migration modes, including direct database‑to‑database, database‑to‑file‑to‑database, and dump‑copy.

Project repositories:

1 https://gitee.com/xl-echo/dataMigration</code><code>2 https://github.com/xl-echo/dataMigration

Development Environment

Technical Framework

Requirement Analysis and Structure

Historical data migration involves three core processes: extracting data from the source, inserting it into the target, and deleting the source data. The design emphasizes data safety, process robustness, and extensibility, addressing issues such as data loss and incomplete copies, especially in payment industry scenarios.

Payment industry: Migration must guarantee data integrity; failure to migrate is acceptable, but data loss is not.

Check processes: Additional verification steps ensure source and target data consistency.

Transaction design: Handles system exceptions to prevent data loss.

Memory management: Avoids OOM by not loading all data into memory at once.

Code Flow Design

The system defines a generic migration flow managed by multiple design patterns, enabling flexible extensions and clean code. The overall process diagram is illustrated below.

Physical Architecture Diagram

Database Model Design

The migration relies on a set of tables that define tasks, source/target configurations, column selections, insert specifications, and logging. Key tables include:

transfer_data_task: Manages migration tasks and execution mode.

transfer_database_config: Stores source and target database connection details.

transfer_select_config: Defines source table, columns, conditions, and batch size. Example SQL: select {fields} from {table} where {condition} limit {batch} transfer_insert_config: Defines target table and insert columns. Example SQL: insert into {table} ({fields}) values (...) transfer_log: Records migration logs for traceability.

Log Selection and Design (Link Tracing)

The project uses Log4j for reliable logging and implements a custom link‑tracing mechanism that generates a unique key per execution using either a Snowflake algorithm or UUID. The unique key is stored in the transfer_log table to facilitate end‑to‑end tracing.

Snowflake implementation:

package com.echo.one.utils.uuid;
/**
 * Snowflake algorithm
 */
public class SnowflakeIdWorker {
    private final long twepoch = 1420041600000L;
    private final long workerIdBits = 5L;
    private final long datacenterIdBits = 5L;
    private final long maxWorkerId = -1L ^ (-1L << workerIdBits);
    private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
    private final long sequenceBits = 12L;
    private final long workerIdShift = sequenceBits;
    private final long datacenterIdShift = sequenceBits + workerIdBits;
    private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
    private final long sequenceMask = -1L ^ (-1L << sequenceBits);
    private long workerId;
    private long datacenterId;
    private long sequence = 0L;
    private long lastTimestamp = -1L;
    public SnowflakeIdWorker(long workerId, long datacenterId) { /* validation omitted */ }
    public synchronized long nextId() { /* generation logic omitted */ }
    protected long tilNextMillis(long lastTimestamp) { /* wait logic omitted */ }
    protected long timeGen() { return System.currentTimeMillis(); }
}

The Snowflake approach can suffer from ID duplication due to clock rollback or high‑throughput generation on a single machine; the project therefore falls back to UUID when duplication occurs.

Migration Modes

Three migration modes are defined in transfer_data_task.transfer_mode (0, 1, 2). Mode 1 (direct DB‑to‑DB) is recommended.

Mode 0 – DB → File → DB (not recommended): Provides a safe backup step but requires local disk space; testing with 1 million rows consumed ~10 GB.

Mode 1 – DB → DB (recommended): Direct transfer with minimal overhead.

Mode 2 – source.dump copy target.dump (risky): Direct file copy without validation.

Module Structure

dataMigration
├─ bin
│  └─ startup.sh
├─ doc
│  ├─ blog.sql
│  ├─ blogbak.sql
│  └─ community.sql
├─ src/main/java/com.echo.one
│  ├─ common/base/TransferContext.java
│  ├─ common/base/TransferDataContext.java
│  ├─ config/DataMigrationConstant.java
│  ├─ enums/DatabaseDirection.java
│  ├─ exception/DataMigrationException.java
│  ├─ factory/DataMigrationBeanFactory.java
│  ├─ framework/config/RemoveDruidAdConfig.java
│  ├─ framework/filter/WebMvcConfig.java
│  ├─ framework/handler/GlobalExceptionHandler.java
│  ├─ framework/result/Result.java
│  ├─ controller/TransferDatabaseConfigController.java
│  ├─ dao/TransferDatabaseConfigMapper.java
│  ├─ job/DataTaskThread.java
│  ├─ po/TransferDatabaseConfig.java
│  ├─ processer/CheckInsertDataProcessModeOne.java
│  ├─ service/imp/TransferDatabaseConfigServiceImpl.java
│  └─ utils/uuid/SnowflakeIdWorker.java
├─ resources/application.yml
├─ resources/logback-spring.xml
└─ pom.xml

Design Patterns

The project applies several patterns to keep the codebase extensible and maintainable.

Bridge Pattern

Abstracts the migration steps into a ProcessMode class, allowing each mode to provide its own implementation without coupling to the caller.

public abstract class ProcessMode {
    public final void invoke(TransferContext ctx) {
        try { handler(ctx); }
        catch (DataMigrationException e) { throw e; }
        catch (Exception e) { throw new DataMigrationException("system error, msg: " + e.getMessage()); }
    }
    protected abstract void handler(TransferContext ctx);
}

Concrete implementations are retrieved via Spring by name, e.g.:

ProcessMode sourceBean = SpringContextUtils.getBean(
    "process.selectData." + transferContext.getTransferDataTask().getTransferMode(),
    ProcessMode.class);

Singleton Pattern

A thread‑pool utility ExecutorServiceUtil is implemented as a singleton to manage concurrent migration tasks safely.

public class ExecutorServiceUtil {
    private static ThreadPoolExecutor executorService;
    static { executorService = new ThreadPoolExecutor(...); }
    private ExecutorServiceUtil() { if (executorService != null) throw new DataMigrationException(...); }
    public static ExecutorService getInstance() { return executorService; }
}

Prototype‑scoped beans are used for per‑task instances to avoid shared mutable state.

Transaction Management

Transactions are crucial for data consistency. The project currently does not employ distributed transactions; instead, it controls execution order so that if insertData fails, subsequent deleteData is skipped, preventing orphaned source rows.

Conclusion

Each design choice balances flexibility, safety, and performance. The system demonstrates how microservices, design patterns, and careful resource management can together solve the challenges of large‑scale MySQL data migration, while remaining open to future enhancements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration design patterns Java Logging mysql multithreading

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.