Why ‘INSERT INTO SELECT’ Can Cost Your Company $10k – A MySQL Post‑mortem
A real‑world MySQL post‑mortem shows how using INSERT INTO SELECT for large‑scale data migration caused full‑table scans, transaction‑level locking, data loss and a costly dismissal, and explains how proper indexing and query design can prevent such disasters.
1. The Origin of the Issue
The company processes millions of rows daily in a single MySQL table without sharding, so they needed a data‑migration strategy to keep performance acceptable.
A colleague proposed two solutions:
Query the data in application code, insert into a history table, then delete the original rows.
Use INSERT INTO SELECT and let the database handle the whole operation.
The first approach caused an OOM when loading all data at once; batching reduced I/O but was too slow, so they chose the second approach, tested it successfully, deployed it, and were later fired.
2. What Actually Happened – A Review
First solution – Pseudocode
// 1. Query the data to be migrated
List<Object> list = selectData();
// 2. Insert into the history table
insertData(list);
// 3. Delete the original rows
deleteByIds(ids);The OOM was caused by loading the entire dataset into memory.
Second solution – What Went Wrong
To keep ten days of data (about 10 k rows) they executed a time‑based INSERT INTO SELECT ... WHERE dateTime < (Ten days ago), avoiding pagination and OOM in tests. The job ran nightly at 8 pm.
During the next day's reconciliation, many payment records were missing because inserts started failing after the job began. The root cause was traced to the migration task itself.
3. Post‑mortem
Where was the problem?
Examining the EXPLAIN of the statement revealed a full‑table scan. A full scan on a large table makes the migration take a long time and, under the default transaction isolation level, locks the target table while locking rows in the source table one by one. This caused intermittent lock‑wait timeouts and insert failures.
When the condition was changed to use an index, the full scan disappeared and the issue was resolved.
Conclusion: The full‑table scan caused the incident.
Why did the test not reveal the problem?
The test environment used real data but did not simulate the massive concurrent inserts that occur in production, so the full‑scan latency and locking behavior were not observed.
4. Solution
Avoid full‑table scans by adding appropriate indexes on the columns used in the WHERE clause so that the SELECT part always uses an index.
5. Can INSERT INTO SELECT Still Be Used?
Yes, but only with proper indexing and awareness of its locking behavior.
6. Summary
When using INSERT INTO SELECT, ensure indexes exist on the filtering columns to prevent full‑table scans and the associated locking issues that can lead to data loss.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
