Beware of INSERT INTO SELECT: A MySQL Data Migration Incident and Lessons Learned
This article recounts a costly MySQL data‑migration failure caused by an unchecked INSERT INTO SELECT operation, analyzes why full‑table scans and transaction isolation led to payment‑record loss, and offers practical safeguards such as proper indexing and careful use of the statement.
The author shares a real‑world MySQL incident where using INSERT INTO SELECT for data migration caused a company to lose nearly 100,000 CNY and resulted in the developer’s dismissal.
1. Origin of the Issue
The company processes millions of rows daily in a single MySQL table without sharding, so a migration task was needed to keep performance. A colleague proposed two approaches: loading data via program and inserting into a history table, or letting the database handle everything with INSERT INTO SELECT .
Programmatic load → history table → delete original rows (caused OOM when loading all at once).
Direct INSERT INTO SELECT (chosen after testing).
2. What Actually Happened?
First solution – pseudo code
// 1. Query data to migrate
List<Object> list = selectData();
// 2. Insert into history table
insertData(list);
// 3. Delete original rows
deleteByIds(ids);This approach loaded all rows into memory, leading to an OOM crash.
Second solution – using INSERT INTO SELECT
The migration kept only the last ten days of data (about 10 k rows) and executed a nightly task at 20:00: INSERT INTO target SELECT * FROM source WHERE dateTime < (NOW() - INTERVAL 10 DAY) . The test environment with 10 k rows passed, but in production the table held millions of rows.
After the nightly job ran, payment‑flow records started failing to insert, causing data loss and mismatched financial statements.
3. Post‑mortem
Root Cause
The INSERT INTO SELECT statement performed a full‑table scan, locking the source table under the default transaction isolation level. While the target rows were inserted, the source rows were locked row‑by‑row, leading to intermittent failures and time‑outs.
Testing missed the issue because the test data volume was far smaller than production.
4. Solution
Avoid full‑table scans by adding appropriate indexes on the columns used in the WHERE clause so the query can use an index instead of scanning the whole table.
5. Can INSERT INTO SELECT Still Be Used?
Yes, but only with proper indexing and careful consideration of transaction locking.
6. Summary
When using INSERT INTO SELECT for large‑scale data migration, ensure indexes exist on filter columns, understand the locking behavior under the current isolation level, and test with realistic data volumes to prevent costly production incidents.
Source: juejin.cn
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.