Efficient Methods for Removing Duplicate Records in MySQL Tables
This article explains why a naïve Python‑based row‑deletion approach is slow for large MySQL tables and provides step‑by‑step SQL techniques—including identifying duplicate names, handling MySQL’s update‑from‑same‑table limitation, and deleting duplicates while preserving a single record per group—complete with executable code examples.
The author encountered six tables with duplicate data, two of which contain over 960,000 and 300,000 rows respectively, and found a previously used Python script too slow (about one second per row, estimating eight hours for 20,000 duplicates).
After reflecting on the need for deeper problem‑solving, the article proceeds to detail the deduplication steps using pure SQL.
CREATE TABLE `animal` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('1','cat','12');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('2','dog','13');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('3','camel','25');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('4','cat','32');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('5','dog','42');The goal is to remove rows with identical name values.
First, identify duplicate names:
SELECT name, COUNT(1)
FROM student
GROUP BY name
HAVING COUNT(1) > 1;The result shows that cat and dog each appear twice.
Attempting a direct delete fails with MySQL error 1093 because the same table is used in the subquery:
DELETE FROM student
WHERE NAME IN (
SELECT NAME
FROM student
GROUP BY NAME
HAVING COUNT(1) > 1
);MySQL does not allow updating a table while selecting from it. The workaround is to materialize the subquery as a derived table:
DELETE FROM student
WHERE NAME IN (
SELECT t.NAME
FROM (
SELECT NAME
FROM student
GROUP BY NAME
HAVING COUNT(1) > 1
) t
);To keep one record per duplicate group, first select the minimum id for each name (the row to retain), then delete rows whose id is not in that set:
SELECT *
FROM student
WHERE id NOT IN (
SELECT t.id
FROM (
SELECT MIN(id) AS id
FROM student
GROUP BY name
) t
);Finally, replace the SELECT with DELETE to perform the actual removal:
DELETE FROM student
WHERE id NOT IN (
SELECT t.id
FROM (
SELECT MIN(id) AS id
FROM student
GROUP BY name
) t
);This approach runs very quickly even on tables with over 900,000 rows. The article concludes with a celebratory note.
Laravel Tech Community
Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.