Databases 8 min read

Why ‘INSERT INTO SELECT’ Can Crash Your MySQL Migration and How to Fix It

A large‑scale MySQL table migration using INSERT INTO SELECT caused hidden full‑table scans and lock contention, leading to data loss and a costly OOM incident, but adding the proper index and understanding transaction isolation prevented the failure.

Architect
Architect
Architect
Why ‘INSERT INTO SELECT’ Can Crash Your MySQL Migration and How to Fix It

Background and Problem Statement

The company processes millions of transactions daily on a single MySQL table without sharding, so a data‑migration job was needed to keep the table performant while preserving recent data.

Proposed Solutions

Programmatically fetch rows, insert them into a history table, then delete the originals.

Use INSERT INTO SELECT so the database performs the whole operation.

The first approach OOMed when all rows were loaded into memory; batching reduced memory pressure but increased I/O and runtime, so the team chose the second approach. Tests on a staging environment passed, and the job was deployed at 20:00 nightly.

First Approach – Pseudo‑code and Failure Reason

// 1. Query data to migrate
List<Object> list = selectData();

// 2. Insert into history table
insertData(list);

// 3. Delete original rows
deleteByIds(ids);

The OOM occurred because the code loaded the entire result set into memory before any deletion.

Second Approach – What Actually Happened

The job kept only the last ten days of data (≈10 k rows) and executed:

INSERT INTO history_table SELECT * FROM main_table WHERE datetime < NOW() - INTERVAL 10 DAY;

In the test environment the query finished quickly, but in production the nightly run caused intermittent insert failures after midnight, resulting in missing payment records.

Root‑Cause Investigation

Disabling the migration task stopped the failures, indicating the job was the trigger. An EXPLAIN of the statement showed a full‑table scan:

EXPLAIN result
EXPLAIN result

A full scan on a large table makes the migration take about an hour, which explains why the issue only appeared during the night when the job ran for a long time.

When the WHERE clause was rewritten to use an indexed column, the plan switched to an index range scan and the problem disappeared:

Optimized EXPLAIN result
Optimized EXPLAIN result

Why Full‑Table Scan Causes Failures

Under MySQL’s default isolation level, INSERT INTO … SELECT locks the target table for the whole statement, while rows from the source table are locked row‑by‑row. A full scan forces MySQL to lock a huge number of rows, leading to lock wait timeouts and intermittent insert failures, especially when other transactions are accessing the payment‑flow table.

Why the Test Missed the Issue

The test used a realistic data set but did not simulate the concurrent high‑volume inserts that occur in production, nor the long‑running nature of the nightly job. Consequently, the lock contention and timeout behavior were not reproduced.

Solution and Best Practices

To keep using INSERT INTO SELECT safely:

Add an appropriate index on the columns used in the WHERE clause so the SELECT uses an index range scan instead of a full scan.

Verify the execution plan with EXPLAIN before deploying.

Consider breaking the migration into smaller batches if the table is still large.

Conclusion

‘INSERT INTO SELECT’ is powerful but must be used with proper indexing; otherwise, full‑table scans can lock the whole table, cause timeouts, and lead to data loss.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data MigrationindexingmysqlINSERT INTO SELECTtransaction isolationDatabase Performance
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.