How to Efficiently Remove Duplicate Rows in MySQL Tables
This guide explains step‑by‑step how to identify and delete duplicate records in MySQL tables, covering simple SELECT checks, handling MySQL’s update‑from limitation, and fast deletion techniques that keep one record per duplicate group.
When a database contains tables with duplicate rows, a naïve Python script that loops over each duplicate and deletes it one by one can be extremely slow (about one second per row, leading to hours for tens of thousands of duplicates). This article presents more efficient MySQL‑based methods.
Identify duplicate rows
First, find which name values appear more than once:
SELECT name, COUNT(1)
FROM student
GROUP BY name
HAVING COUNT(1) > 1;The result shows, for example, cat 2 and dog 2, meaning each appears twice.
Attempted direct delete and MySQL limitation
A direct delete using the same table in a subquery fails with error 1093 because MySQL does not allow updating a table while selecting from it:
DELETE FROM student
WHERE name IN (
SELECT name FROM student GROUP BY name HAVING COUNT(1) > 1
);MySQL reports: "You can't specify target table 'student' for update in FROM clause" .
Workaround using a derived table
Wrap the subquery in an extra SELECT to create a temporary derived table, then use it in the DELETE:
DELETE FROM student
WHERE name IN (
SELECT t.name FROM (
SELECT name FROM student GROUP BY name HAVING COUNT(1) > 1
) t
);Delete duplicates while keeping one row per group
To retain a single record for each duplicate name, first identify the smallest id per group (the row to keep), then delete rows whose id is not among those:
DELETE FROM student
WHERE id NOT IN (
SELECT t.id FROM (
SELECT MIN(id) AS id FROM student GROUP BY name
) t
);This statement removes all extra duplicates and runs very fast even on tables with over 900,000 rows.
Summary of steps
Use GROUP BY … HAVING COUNT(1) > 1 to locate duplicate values.
When deleting, avoid MySQL’s update‑from restriction by nesting the subquery in a derived table.
To keep one instance per duplicate group, delete rows whose id is not the minimum id for that group.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
