Backend Development 13 min read

Turning a 2‑Month Data Migration into 4 Hours: Backend Performance Secrets

After struggling with a two‑month data migration that processed 20 million users, the author iteratively redesigned the system—from a single‑threaded, procedural approach to a fully decoupled, multithreaded architecture using queues and batch operations—ultimately reducing runtime to just four hours while ensuring data consistency and recoverability.

Java Backend Technology

Jul 29, 2018

Turning a 2‑Month Data Migration into 4 Hours: Backend Performance Secrets

Initially the author faced a data migration task that would take two months to process 20 million user records, prompting a series of performance optimizations that eventually reduced the runtime to four hours.

1. Project Description

The task involved reading 20 million users from database A, generating a GUID for each, inserting the user into database B via an SDK registration interface, and creating an association table in database A. Requirements included using the SDK (no direct JDBC), ensuring recoverable runs, maintaining data consistency, and completing the job within one day.

2. First Version – Procedural (2 months)

This version was a single‑threaded, tightly‑coupled procedural program that read one record, processed it, called the SDK, and wrote the association table, without counters or error handling. The slowest step in the chain blocked the whole JVM, and each SDK call opened a new HTTP connection, leading to an estimated two‑month execution time.

3. Second Version – Object‑Oriented (21 days)

By introducing a configuration object (BatchStrategy) and three workers (Reader, Processor, Writer) plus an ErrorHandler, the design became more modular and allowed batch inserts and SDK calls. Batch operations reduced HTTP requests and used JDBC batch execution, cutting the runtime from two months to 21 days, though the overall speed was still limited by the slowest component.

4. Third Version – Full Decoupling (Queue + Multithreading) – 3 days

The introduction of a thread‑safe queue (ConcurrentLinkedQueue) decoupled Reader, Processor and Writer, allowing asynchronous processing and true multithreading. Each stage could work independently, improving fault tolerance and recoverability. However, the Processor remained slower than Writer, and the MySQL LIMIT‑based reads became a bottleneck for large offsets.

5. Fourth Version – Highly Abstracted (One‑Click Start) – 4 hours

By defining a common Job interface for Reader, Processor and Writer, and by optimizing the LIMIT query using an indexed phone‑number column, the system achieved both simplicity and speed. The design supports both batch and single‑record inserts, multithreading, and full data recoverability, ultimately completing the migration in four hours.

6. Further Optimization Thoughts

Parallelizing the Reader could further reduce the LIMIT‑induced slowdown.

Asynchronous logging would eliminate the heavy cost of millions of log writes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration Batch Processing multithreading

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

1. Project Description

2. First Version – Procedural (2 months)

3. Second Version – Object‑Oriented (21 days)

4. Third Version – Full Decoupling (Queue + Multithreading) – 3 days

5. Fourth Version – Highly Abstracted (One‑Click Start) – 4 hours

6. Further Optimization Thoughts

Java Backend Technology

How this landed with the community

Was this worth your time?

0 Comments

2. First Version – Procedural (2 months)

3. Second Version – Object‑Oriented (21 days)

4. Third Version – Full Decoupling (Queue + Multithreading) – 3 days

5. Fourth Version – Highly Abstracted (One‑Click Start) – 4 hours