How to Remove Duplicate Records in MySQL Tables
This article explains why duplicate rows appeared in production MySQL tables, demonstrates how to identify them with SELECT queries, and provides two SQL solutions—one to delete all duplicates and another to keep a single record per duplicated key—while preserving data integrity.
The overtime work was caused by a production deployment that revealed duplicate data in the online database and a program bug that was quickly fixed. Six tables contained duplicate rows, two of them very large (over 960,000 and 300,000 rows). A previously used Python deduplication script was reused, but its performance was extremely low (about one row per second), estimating eight hours to clean 20,000+ duplicates.
The article then details the deduplication steps.
CREATE TABLE `animal` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('1','cat','12');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('2','dog','13');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('3','camel','25');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('4','cat','32');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('5','dog','42');The goal is to remove rows with the same name value.
First, identify duplicated names:
SELECT name, COUNT(1)
FROM student
GROUP BY name
HAVING COUNT(1) > 1;The result shows that cat and dog each appear twice.
Delete All Duplicate Rows (Leave None)
Attempting a direct delete fails with MySQL error 1093 because the same table is used in the subquery.
DELETE FROM student
WHERE NAME IN (
SELECT NAME
FROM student
GROUP BY NAME
HAVING COUNT(1) > 1
);MySQL returns: 1093 - You can't specify target table 'student' for update in FROM clause . The reason is that MySQL does not allow updating a table while selecting from it in the same statement.
Solution: materialize the subquery as a derived table and then delete:
DELETE FROM student
WHERE NAME IN (
SELECT t.NAME
FROM (
SELECT NAME FROM student GROUP BY NAME HAVING COUNT(1) > 1
) t
);Delete Duplicates While Keeping One Row per Name
First, find the rows to keep (the smallest id for each name ), then delete everything else:
SELECT *
FROM student
WHERE id NOT IN (
SELECT t.id
FROM (
SELECT MIN(id) AS id FROM student GROUP BY `name`
) t
);Replace the SELECT with DELETE to perform the cleanup:
DELETE FROM student
WHERE id NOT IN (
SELECT t.id
FROM (
SELECT MIN(id) AS id FROM student GROUP BY `name`
) t
);Note: The article also includes a promotional line encouraging readers to follow a Java interview public account for additional resources.
Laravel Tech Community
Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.