Efficient Methods to Remove Duplicate Rows in MySQL Tables
This article explains how to identify and delete duplicate records in large MySQL tables, discusses why a naïve delete fails, and provides two robust SQL solutions—one that removes all duplicates and another that retains a single row per duplicated key—demonstrating fast execution even on tables with hundreds of thousands of rows.
The author discovered duplicate data in two large MySQL tables (over 960,000 and 300,000 rows) and initially tried a slow Python script that removed one row per second.
Realizing the inefficiency, the article shifts to a pure SQL approach for deduplication.
Goal: Remove rows where the name column is duplicated.
First, identify duplicated names:
SELECT name, COUNT(1)
FROM student
GROUP BY name
HAVING COUNT(1) > 1;The query returns, for example, cat 2 and dog 2 , indicating two rows each for those names.
Attempting a direct delete fails with MySQL error 1093 because the same table is used in the subquery:
1093 - You can't specify target table 'student' for update in FROM clause
The reason is that MySQL does not allow a table to be read and modified in the same statement, which can cause a dead‑lock.
**Solution 1 – Delete all duplicate rows (no rows left):** Use a derived table to isolate the names first.
DELETE FROM student
WHERE NAME IN (
SELECT t.NAME
FROM (SELECT NAME FROM student GROUP BY NAME HAVING COUNT(1) > 1) t
);**Solution 2 – Keep one row per duplicate:** First select the rows to keep (the smallest id per name ), then delete the others.
SELECT *
FROM student
WHERE id NOT IN (
SELECT t.id
FROM (
SELECT MIN(id) AS id FROM student GROUP BY name
) t
);Replace the SELECT with DELETE to perform the removal:
DELETE FROM student
WHERE id NOT IN (
SELECT t.id
FROM (
SELECT MIN(id) AS id FROM student GROUP BY name
) t
);This method deletes only the redundant rows and leaves one representative row for each duplicated name . The author notes that the operation runs very quickly even on a table with over 900,000 rows.
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.