When INSERT INTO SELECT Breaks MySQL: A Cautionary Data‑Migration Story
An engineer’s costly mistake using MySQL’s INSERT INTO SELECT for nightly data migration led to out‑of‑memory crashes, full‑table scans, and payment record loss, prompting a deep dive into locking behavior, transaction isolation, and how proper indexing can safely rescue large‑scale inserts.
Background: A company processes millions of rows daily in a single MySQL payment transaction table and needs to migrate old data to keep the table performant.
Two migration options were proposed:
Programmatically fetch data, insert it into a history table, then delete the original rows.
Use a single INSERT INTO SELECT statement to let MySQL handle the whole operation.
The first approach caused an out‑of‑memory (OOM) error when all rows were loaded at once; batch processing reduced I/O but was still heavy, so the team chose the second approach. It passed internal tests, was deployed, but later caused data loss and the engineer was dismissed.
What Went Wrong?
Pseudo‑code for the first approach:
// 1. Query data to migrate
List<Object> list = selectData();
// 2. Insert into history table
insertData(list);
// 3. Delete original rows
deleteByIds(ids);The OOM stemmed from loading the entire dataset into memory.
The second approach used a nightly INSERT INTO SELECT with a time filter (e.g., dateTime < NOW() - INTERVAL 10 DAY) to move roughly 10 k rows at 8 pm.
Initially the job seemed fine, but after deployment, payment records started failing at night, leading to mismatched financial statements and missing data.
Investigation
The EXPLAIN output of the INSERT INTO SELECT showed a full table scan:
A full scan on a large table makes the operation long and holds locks. Under MySQL’s default isolation level, the target table ( a) is locked entirely, while rows from the source table ( b) are locked row‑by‑row. This caused intermittent lock‑wait timeouts and insert failures during the nightly batch.
Testing did not reveal the issue because the test environment used a much smaller data set and did not reproduce the heavy‑load scenario.
Solution
To avoid the full table scan, add appropriate indexes on the columns used in the WHERE clause so the SELECT part can use an index. After adjusting the query to use an indexed condition, the job completed without scanning the whole table and the failures disappeared. INSERT INTO SELECT remains usable, but only when the SELECT is indexed and the data volume is controlled.
Takeaway
When using INSERT INTO SELECT for large migrations, ensure the SELECT is covered by indexes and understand the locking behavior under the current transaction isolation level to prevent data loss.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
