Databases 5 min read

Efficient Methods for Removing Duplicate Records in MySQL Tables

This article explains why a naïve Python‑based row‑deletion approach is slow for large MySQL tables and provides step‑by‑step SQL techniques—including identifying duplicate names, handling MySQL’s update‑from‑same‑table limitation, and deleting duplicates while preserving a single record per group—complete with executable code examples.

Laravel Tech Community
Laravel Tech Community
Laravel Tech Community
Efficient Methods for Removing Duplicate Records in MySQL Tables

The author encountered six tables with duplicate data, two of which contain over 960,000 and 300,000 rows respectively, and found a previously used Python script too slow (about one second per row, estimating eight hours for 20,000 duplicates).

After reflecting on the need for deeper problem‑solving, the article proceeds to detail the deduplication steps using pure SQL.

CREATE TABLE `animal` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('1','cat','12');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('2','dog','13');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('3','camel','25');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('4','cat','32');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('5','dog','42');

The goal is to remove rows with identical name values.

First, identify duplicate names:

SELECT name, COUNT(1)
FROM student
GROUP BY name
HAVING COUNT(1) > 1;

The result shows that cat and dog each appear twice.

Attempting a direct delete fails with MySQL error 1093 because the same table is used in the subquery:

DELETE FROM student
WHERE NAME IN (
  SELECT NAME
  FROM student
  GROUP BY NAME
  HAVING COUNT(1) > 1
);

MySQL does not allow updating a table while selecting from it. The workaround is to materialize the subquery as a derived table:

DELETE FROM student
WHERE NAME IN (
  SELECT t.NAME
  FROM (
    SELECT NAME
    FROM student
    GROUP BY NAME
    HAVING COUNT(1) > 1
  ) t
);

To keep one record per duplicate group, first select the minimum id for each name (the row to retain), then delete rows whose id is not in that set:

SELECT *
FROM student
WHERE id NOT IN (
  SELECT t.id
  FROM (
    SELECT MIN(id) AS id
    FROM student
    GROUP BY name
  ) t
);

Finally, replace the SELECT with DELETE to perform the actual removal:

DELETE FROM student
WHERE id NOT IN (
  SELECT t.id
  FROM (
    SELECT MIN(id) AS id
    FROM student
    GROUP BY name
  ) t
);

This approach runs very quickly even on tables with over 900,000 rows. The article concludes with a celebratory note.

SQLMySQLDatabase Optimizationdata cleaningDuplicate Removal
Laravel Tech Community
Written by

Laravel Tech Community

Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.