Databases 8 min read

How We Cut 860,000 Table Migration Time from 19 Hours to Under 10

This article details the DRS team's step‑by‑step investigation and optimization of a massive MySQL migration involving 860,000 tables, addressing timeout, OOM, and performance bottlenecks to reduce total migration time from over 19 hours to under 10 hours.

Huawei Cloud Developer Alliance

Apr 27, 2020

How We Cut 860,000 Table Migration Time from 19 Hours to Under 10

Problem Background

In December 2019 the DRS project team received an online issue: a customer migrating a MySQL database from a competitor to Huawei Cloud experienced severe performance degradation and errors. The source environment contained about 2,000 databases and 860,000 tables, and queries to the source timed out.

Problem Identification

The root cause was a SQL statement that queried information_schema.TRIGGERS for each database. Although the query targeted a single schema, MySQL scanned the metadata files (frm) of all databases, causing the query to run for over five minutes and generating massive I/O load.

SELECT TRIGGER_NAME FROM information_schema.TRIGGERS WHERE TRIGGER_SCHEMA = '****';

Another query that listed all triggers across databases also suffered from the same “Scanned all databases” issue.

Optimization Approach for Object Listing

We separated object handling by type. For tables we performed per‑database queries (which do not trigger a full‑database scan). For other objects we used a single query that excluded system schemas:

SELECT TRIGGER_SCHEMA, TRIGGER_NAME FROM information_schema.TRIGGERS WHERE TRIGGER_SCHEMA NOT IN ('mysql','information_schema','sys','performance_schema');

This dramatically reduced the time spent listing objects.

OOM Issue

Heap analysis revealed a large number of DbSqlData objects stored in ehcache, leading to out‑of‑memory errors. With an average of 4 KB per table, 860,000 tables required about 3.44 GB of heap, exceeding the 2 GB perm‑gen allocation.

if [ -z "$REPLICATOR_HEAP_SIZE" ]; then
  REPLICATOR_HEAP_SIZE="-Xms3072m -Xmx3072m -XX:NewRatio=3 -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=256m"
fi

The solution was to store objects at a finer granularity and control the number of objects loaded into memory, enabling a streaming‑style replay.

Performance Bottlenecks and Re‑engineering

The migration workflow consisted of three serial stages: object list query (1 h), structure export (10 h), and replay (8 h), totaling 19 h. We introduced two key improvements:

Export structure concurrently while querying the object list.

Parallelize structure export using eight threads.

With these changes the combined time for listing and exporting becomes max(A, B/8) ≈ 2 h, and the total migration time drops to 2 h + 8 h = 10 h, meeting the client’s 12‑hour deadline.

Optimization Results

After extensive development and validation, the new framework reduced the total migration time for 860,000 tables to under 10 hours, eliminated full GC events, and satisfied the customer’s requirements.

Further Thoughts

Potential future improvements include parallelizing structure import with data replay and exploring whether structure export and replay can run concurrently.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql database migration OutOfMemory DRS Large Scale Tables

Written by

Huawei Cloud Developer Alliance

The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.