Databases 5 min read

How to Remove Duplicate Records in MySQL Tables

This article explains why duplicate rows appeared in production MySQL tables, demonstrates how to identify them with SELECT queries, and provides two SQL solutions—one to delete all duplicates and another to keep a single record per duplicated key—while preserving data integrity.

Laravel Tech Community

Sep 10, 2021

How to Remove Duplicate Records in MySQL Tables

The overtime work was caused by a production deployment that revealed duplicate data in the online database and a program bug that was quickly fixed. Six tables contained duplicate rows, two of them very large (over 960,000 and 300,000 rows). A previously used Python deduplication script was reused, but its performance was extremely low (about one row per second), estimating eight hours to clean 20,000+ duplicates.

The article then details the deduplication steps.

CREATE TABLE `animal` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('1','cat','12');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('2','dog','13');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('3','camel','25');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('4','cat','32');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('5','dog','42');

The goal is to remove rows with the same name value.

First, identify duplicated names:

SELECT name, COUNT(1)
FROM student
GROUP BY name
HAVING COUNT(1) > 1;

The result shows that cat and dog each appear twice.

Delete All Duplicate Rows (Leave None)

Attempting a direct delete fails with MySQL error 1093 because the same table is used in the subquery.

DELETE FROM student
WHERE NAME IN (
  SELECT NAME
  FROM student
  GROUP BY NAME
  HAVING COUNT(1) > 1
);

MySQL returns:

1093 - You can't specify target table 'student' for update in FROM clause

. The reason is that MySQL does not allow updating a table while selecting from it in the same statement.

Solution: materialize the subquery as a derived table and then delete:

DELETE FROM student
WHERE NAME IN (
  SELECT t.NAME
  FROM (
    SELECT NAME FROM student GROUP BY NAME HAVING COUNT(1) > 1
  ) t
);

Delete Duplicates While Keeping One Row per Name

First, find the rows to keep (the smallest id for each name), then delete everything else:

SELECT *
FROM student
WHERE id NOT IN (
  SELECT t.id
  FROM (
    SELECT MIN(id) AS id FROM student GROUP BY `name`
  ) t
);

Replace the SELECT with DELETE to perform the cleanup:

DELETE FROM student
WHERE id NOT IN (
  SELECT t.id
  FROM (
    SELECT MIN(id) AS id FROM student GROUP BY `name`
  ) t
);

Note: The article also includes a promotional line encouraging readers to follow a Java interview public account for additional resources.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

sql mysql data deduplication database cleanup duplicate removal

Written by

Laravel Tech Community

Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.