Databases 9 min read

How to Find and Delete Duplicate Records in MySQL Efficiently

This article explains how to identify duplicate rows in a MySQL table using GROUP BY and HAVING, shows several SELECT queries to list duplicates, and provides multiple DELETE strategies—including sub‑queries and multi‑column handling—to safely remove excess records while keeping one copy.

Programmer DD

Oct 27, 2020

How to Find and Delete Duplicate Records in MySQL Efficiently

When building a question bank, duplicate entries can cause the same exam question to appear multiple times, so it is necessary to identify and delete duplicate rows, keeping only one copy.

Single‑field operations

This example uses a table named dept (illustrated below).

Group introduction

Select 重复字段 From 表 Group By 重复字段 Having Count(*)>1

Use GROUP BY to group rows by the duplicate column and HAVING to keep only groups whose count is greater than 1.

GROUP BY <column list>

HAVING <group condition>

This query returns rows where the column (e.g., dname) appears more than once.

There is no practical difference between COUNT(*) and COUNT(1); either can be used.

COUNT(*) returns the total number of rows, including rows where the column is NULL, while COUNT(column) counts only rows where the column is NOT NULL (default values are counted).

1. Query all duplicate rows

Select * From 表 Where 重复字段 In (Select 重复字段 From 表 Group By 重复字段 Having Count(*)>1)

2. Delete all duplicate rows

Changing the above SELECT to DELETE directly causes an error.

DELETE<br/>FROM<br/> dept<br/>WHERE dname IN (<br/>  SELECT dname<br/>  FROM dept<br/>  GROUP BY dname<br/>  HAVING count(1)>1<br/>)

Error:

[Err] 1093 - You can't specify target table 'dept' for update in FROM clause

The error occurs because MySQL does not allow updating a table while selecting from the same table in a subquery (a form of deadlock).

Solution: first select the rows to be deleted into a temporary result set, then delete using that set.

3. Query extra duplicate rows (keep the smallest deptno )

a. First method

SELECT * FROM dept WHERE dname IN (SELECT dname FROM dept GROUP BY dname HAVING COUNT(1)>1) AND deptno NOT IN (SELECT MIN(deptno) FROM dept GROUP BY dname HAVING COUNT(1)>1)

This works but can be slow.

b. Second method

SELECT * FROM dept WHERE deptno NOT IN (SELECT dt.minno FROM (SELECT MIN(deptno) AS minno FROM dept GROUP BY dname) dt)

c. Third method (recommended in comments)

SELECT * FROM table_name AS ta WHERE ta.唯一键 <> (SELECT max(tb.唯一键) FROM table_name AS tb WHERE ta.判断重复的列 = tb.判断重复的列);

4. Delete extra duplicate rows and keep one

a. First method

DELETE FROM dept WHERE dname IN (SELECT t.dname FROM (SELECT dname FROM dept GROUP BY dname HAVING count(1)>1) t) AND deptno NOT IN (SELECT dt.mindeptno FROM (SELECT min(deptno) AS mindeptno FROM dept GROUP BY dname HAVING count(1)>1) dt);

b. Second method (same as query method b, but DELETE)

DELETE FROM dept WHERE deptno NOT IN (SELECT dt.minno FROM (SELECT MIN(deptno) AS minno FROM dept GROUP BY dname) dt);

c. Third method (comment‑section recommendation)

DELETE FROM table_name AS ta WHERE ta.唯一键 <> (SELECT max(tb.唯一键) FROM table_name AS tb WHERE ta.判断重复的列 = tb.判断重复的列);

Multiple‑field operations

If you can handle a single column, handling multiple columns is straightforward: just add the additional columns to the GROUP BY clause.

DELETE FROM dept WHERE (dname, db_source) IN (SELECT t.dname, t.db_source FROM (SELECT dname, db_source FROM dept GROUP BY dname, db_source HAVING count(1)>1) t) AND deptno NOT IN (SELECT dt.mindeptno FROM (SELECT min(deptno) AS mindeptno FROM dept GROUP BY dname, db_source HAVING count(1)>1) dt);

Summary

Add indexes on columns that are frequently queried.

Replace * with only the columns you actually need.

Use IN when the outer table is small; use EXISTS when the outer table is large, as IN scans the entire inner set while EXISTS checks existence per row.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Optimization SQL mysql delete GROUP BY duplicate records

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.