How We Cut 860,000 Table Migration Time from 19 Hours to Under 10
This article details the DRS team's step‑by‑step investigation and optimization of a massive MySQL migration involving 860,000 tables, addressing timeout, OOM, and performance bottlenecks to reduce total migration time from over 19 hours to under 10 hours.
Problem Background
In December 2019 the DRS project team received an online issue: a customer migrating a MySQL database from a competitor to Huawei Cloud experienced severe performance degradation and errors. The source environment contained about 2,000 databases and 860,000 tables, and queries to the source timed out.
Problem Identification
The root cause was a SQL statement that queried information_schema.TRIGGERS for each database. Although the query targeted a single schema, MySQL scanned the metadata files (frm) of all databases, causing the query to run for over five minutes and generating massive I/O load.
SELECT TRIGGER_NAME FROM information_schema.TRIGGERS WHERE TRIGGER_SCHEMA = '****';Another query that listed all triggers across databases also suffered from the same “Scanned all databases” issue.
Optimization Approach for Object Listing
We separated object handling by type. For tables we performed per‑database queries (which do not trigger a full‑database scan). For other objects we used a single query that excluded system schemas:
SELECT TRIGGER_SCHEMA, TRIGGER_NAME FROM information_schema.TRIGGERS WHERE TRIGGER_SCHEMA NOT IN ('mysql','information_schema','sys','performance_schema');This dramatically reduced the time spent listing objects.
OOM Issue
Heap analysis revealed a large number of DbSqlData objects stored in ehcache, leading to out‑of‑memory errors. With an average of 4 KB per table, 860,000 tables required about 3.44 GB of heap, exceeding the 2 GB perm‑gen allocation.
if [ -z "$REPLICATOR_HEAP_SIZE" ]; then
REPLICATOR_HEAP_SIZE="-Xms3072m -Xmx3072m -XX:NewRatio=3 -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=256m"
fiThe solution was to store objects at a finer granularity and control the number of objects loaded into memory, enabling a streaming‑style replay.
Performance Bottlenecks and Re‑engineering
The migration workflow consisted of three serial stages: object list query (1 h), structure export (10 h), and replay (8 h), totaling 19 h. We introduced two key improvements:
Export structure concurrently while querying the object list.
Parallelize structure export using eight threads.
With these changes the combined time for listing and exporting becomes max(A, B/8) ≈ 2 h, and the total migration time drops to 2 h + 8 h = 10 h, meeting the client’s 12‑hour deadline.
Optimization Results
After extensive development and validation, the new framework reduced the total migration time for 860,000 tables to under 10 hours, eliminated full GC events, and satisfied the customer’s requirements.
Further Thoughts
Potential future improvements include parallelizing structure import with data replay and exploring whether structure export and replay can run concurrently.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Huawei Cloud Developer Alliance
The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
