Databases 8 min read

Lessons Learned from Misusing INSERT INTO SELECT in MySQL Data Migration

A real‑world MySQL data‑migration case study shows how using INSERT INTO SELECT without proper indexing caused a full table scan, OOM errors, and data loss, and explains how to avoid the pitfall by adding appropriate indexes and understanding transaction locking.

Architect

Apr 29, 2025

Lessons Learned from Misusing INSERT INTO SELECT in MySQL Data Migration

1. Incident cause

The company processes millions of rows daily in a single MySQL table without sharding. To keep the table performant, a data‑migration task was created to move old records to a history table.

2. Proposed solutions

Query the data in application code, insert into the history table, then delete the original rows.

Use INSERT INTO SELECT to let MySQL handle the whole operation.

The first approach caused OOM when loading all data at once; batching reduced I/O but was slow, so the second approach was chosen and deployed.

3. What went wrong – detailed analysis

First solution (pseudocode)

// 1、查询对应需要迁移的数据
List<Object> list = selectData();

// 2、将数据插入历史表
insertData(list);

// 3、删除原表数据
deleteByIds(ids);

The OOM was caused by loading the entire dataset into memory.

Second solution – the problematic INSERT INTO SELECT

The developer added a date filter to keep only the last 10 days of data (≈10 k rows) and executed: INSERT INTO SELECT ... WHERE dateTime < (Ten days ago) This avoided paging queries and seemed to work in testing, so the task was scheduled nightly at 20:00.

However, the next day the finance team found missing payment records. The failure started after the nightly job began, with insert operations intermittently failing.

Investigation showed the migration task caused the issue. Stopping the task eliminated the failures.

Examining the EXPLAIN of the INSERT INTO SELECT revealed a full table scan. The massive scan prolonged the migration, leading to lock contention and time‑outs on the payment table.

When the WHERE clause was adjusted to use an index, the full scan disappeared and the problem was resolved.

Root cause

In MySQL’s default isolation level, INSERT INTO a SELECT b locks table a entirely while locking rows of b one by one. This caused partial row‑level locks on the payment table, resulting in time‑outs and insert failures.

Why testing missed the bug

The test environment used realistic data but did not simulate the high‑volume concurrent inserts that occur in production, so the lock‑contention issue was not reproduced.

4. Solution

Avoid full table scans by adding appropriate indexes on the columns used in the WHERE clause so that the SELECT part of the statement uses an index.

5. Can INSERT INTO SELECT still be used?

Yes, but only when proper indexes are in place and the impact on locking is understood.

6. Conclusion

When using INSERT INTO SELECT for large‑scale data migration, always ensure the query is indexed to prevent full table scans, which can cause OOM, lock contention, and data loss.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration Indexing MySQL INSERT INTO SELECT Database Performance Full Table Scan transaction locking

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.