Why ‘INSERT INTO SELECT’ Can Cost You $10k: A MySQL Pitfall Case Study
A real‑world MySQL case study shows how using INSERT INTO SELECT for large data migrations can trigger full‑table scans, lock contention, and massive data loss, emphasizing the need for proper indexing and careful testing before deployment.
1. Origin of the Issue
The company processes massive transaction volumes in MySQL, with daily increments of about one million rows and no sharding, so data migration was needed to maintain table performance.
A colleague proposed two solutions:
Query data via application code, insert into a history table, then delete from the original table.
Use INSERT INTO SELECT to let the database handle the entire operation.
The first approach caused an OOM when loading all data at once; batch processing was too slow, so the second approach was chosen, tested successfully, and deployed—until the author was fired.
2. What Actually Happened?
First Solution Pseudocode
// 1. Query data to migrate
List<Object> list = selectData();
// 2. Insert data into history table
insertData(list);
// 3. Delete original rows
deleteByIds(ids);The OOM was obvious: the code loaded the entire dataset into memory.
Second Solution Details
The team kept only the last ten days of data (about 10k rows) and executed a single statement:
INSERT INTO SELECT ... WHERE dateTime < (NOW() - INTERVAL 10 DAY), avoiding pagination and OOM, and simplifying code.
The migration was scheduled as a nightly task at 8 PM. Although the night load was low, the next day finance discovered mismatched funds and many missing transaction records.
Investigation traced the failures to the migration task after 8 PM, which caused intermittent insert failures.
3. Post‑mortem Analysis
Root Cause
The INSERT INTO SELECT statement performed a full‑table scan, as shown by its EXPLAIN plan:
A full‑table scan on a large table leads to long migration times and lock contention.
Under the default transaction isolation level, INSERT INTO a SELECT b locks table a entirely while locking rows of b individually, causing some rows to be locked and others to time out, resulting in intermittent failures.
Why Tests Passed
The test environment used realistic data size but did not simulate the massive concurrent inserts that occur in production, so the issue was missed.
4. Solution
Avoid full‑table scans by adding appropriate indexes on the columns used in the WHERE clause, ensuring the SELECT part uses an index.
5. Can INSERT INTO SELECT Still Be Used?
Yes, but only with proper indexing and careful consideration of its impact.
6. Summary
When using INSERT INTO SELECT for large data migrations, always verify that the SELECT query uses indexes to prevent full‑table scans, lock contention, and data loss.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
