Databases 6 min read

How to Quickly Remove Duplicate Rows in MySQL Without Locking

This article walks through a practical MySQL tutorial that identifies duplicate records, explains why naïve row‑by‑row deletion is slow, and provides efficient DELETE statements—including derived‑table tricks—to clean up large tables while preserving one record per duplicate group.

Java Interview Crash Guide
Java Interview Crash Guide
Java Interview Crash Guide
How to Quickly Remove Duplicate Rows in MySQL Without Locking

The author worked overtime to fix duplicate data in online MySQL tables during a release.

Six tables contain duplicate rows; two large tables have over 960,000 and 300,000 rows. A simple Python script that deletes duplicates one by one runs at about one second per row, resulting in an estimated eight‑hour job.

Instead of relying on external scripts, the article shows how to use pure MySQL statements to locate and delete duplicates efficiently.

CREATE TABLE `animal` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('1','cat','12');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('2','dog','13');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('3','camel','25');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('4','cat','32');
INSERT INTO `pilipa_dds`.`student` (`id`,`name`,`age`) VALUES ('5','dog','42');

Goal: Remove rows that share the same name value.

First, identify duplicate names:

SELECT name, COUNT(1)
FROM student
GROUP BY name
HAVING COUNT(1) > 1;
name count(1) cat 2 dog 2

Attempting a direct DELETE with the same table in a subquery triggers MySQL error 1093 because the target table is referenced in the FROM clause.

Workaround: Use a derived table to list duplicate names, then delete:

DELETE FROM student
WHERE NAME IN (
  SELECT NAME
  FROM (
    SELECT NAME
    FROM student
    GROUP BY NAME
    HAVING COUNT(1) > 1
  ) t
);

To keep only one row per duplicate name, first select the smallest id for each name and then delete rows whose id is not in that set:

DELETE FROM student
WHERE id NOT IN (
  SELECT t.id
  FROM (
    SELECT MIN(id) AS id
    FROM student
    GROUP BY `name`
  ) t
);

This method runs very fast even on tables with more than 900,000 rows. All done.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLmysqlselectDELETEdatabase cleanupduplicate removal
Java Interview Crash Guide
Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.