Databases 7 min read

Case Study: MySQL INSERT INTO SELECT Caused Data Loss Due to Full Table Scan

A MySQL data‑migration task using INSERT INTO SELECT triggered a full‑table scan, leading to lock contention, payment‑record loss, and ultimately the developer's dismissal, highlighting the dangers of unindexed bulk operations and the need for proper indexing and testing.

Architecture Digest

May 12, 2024

Case Study: MySQL INSERT INTO SELECT Caused Data Loss Due to Full Table Scan

Root Cause

A colleague attempted to migrate large MySQL tables (≈1 million rows per day) without sharding, using two possible approaches: (1) fetch data via program, insert into a history table, then delete the source; (2) let the database perform the whole operation with INSERT INTO SELECT.

What Happened

The first approach loaded all data into memory, causing an OOM error. The second approach seemed to work in testing, so it was deployed as a nightly scheduled job at 20:00.

After deployment, the finance team noticed mismatched payment records; many rows failed to be inserted during the night, resulting in data loss.

Investigation revealed that the nightly migration task was the culprit. When the task was stopped, the issue disappeared.

Review

Problem Analysis

Examining the first approach's pseudo‑code:

// 1. Query data to be migrated
List<Object> list = selectData();

// 2. Insert data into history table
insertData(list);

// 3. Delete original rows
deleteByIds(ids);

The OOM occurred because all rows were loaded into memory at once.

For the second approach, the developer filtered rows older than ten days and executed:

INSERT INTO target SELECT * FROM source WHERE dateTime < (NOW() - INTERVAL 10 DAY);

This avoided paging and seemed to work in the test environment, which used only a subset of real data.

However, the production table held about 10 k rows for ten days, and the nightly job performed a full table scan, as shown in the EXPLAIN image.

Root Cause

Under the default transaction isolation level, INSERT INTO a SELECT b locks table a entirely while locking rows of b one by one. The full‑table scan caused prolonged locks, leading to intermittent insert failures and time‑outs for payment records.

Why Testing Missed the Issue

The test environment used realistic data volume but did not replicate the concurrent heavy insert load that occurs in production, so the problem was not observed.

Solution

Avoid full‑table scans by adding appropriate indexes on the columns used in the WHERE clause, ensuring the SELECT part uses an index and does not lock the whole table.

Can INSERT INTO SELECT Still Be Used?

Yes, but only with proper indexing and awareness of its locking behavior.

Conclusion

When using INSERT INTO SELECT for large data migrations, always verify that the query uses indexes to prevent full‑table scans, and test with production‑scale data and concurrency to avoid hidden lock‑contention bugs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration Performance MySQL INSERT INTO SELECT Full Table Scan

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.