Databases 11 min read

How to Find and Remove Duplicate Rows in MySQL

This article shows step‑by‑step how to identify duplicate rows in a MySQL table—whether the duplication is based on a single column, multiple columns, or either column—and provides reliable SQL patterns, including GROUP BY with HAVING, UNION, sub‑queries, temporary tables, and a safe DELETE statement to clean the data.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Find and Remove Duplicate Rows in MySQL

This article explains how to locate duplicate rows in a MySQL database, a common problem for beginners.

Defining a duplicate row

Usually a duplicate row is one where a specific column (or set of columns) has the same value as another row. The article uses a sample table test(id INT NOT NULL PRIMARY KEY, day DATE NOT NULL) with three rows, two of which share the same day value.

Finding duplicates in a single column

Use GROUP BY to group rows that share the same column value and HAVING COUNT(*) > 1 to keep only groups larger than one:

SELECT day, COUNT(*)
FROM test
GROUP BY day
HAVING COUNT(*) > 1;

The HAVING clause filters after grouping, unlike WHERE, which filters before grouping.

Removing duplicate rows (keep the smallest id )

Create a temporary table that stores the minimum id for each duplicated day:

CREATE TEMPORARY TABLE to_delete (
    day DATE NOT NULL,
    min_id INT NOT NULL
);
INSERT INTO to_delete (day, min_id)
SELECT day, MIN(id)
FROM test
GROUP BY day
HAVING COUNT(*) > 1;
SELECT * FROM to_delete;

Then delete rows whose day appears in to_delete but whose id is not the minimum:

DELETE FROM test
WHERE EXISTS (
    SELECT * FROM to_delete
    WHERE to_delete.day = test.day
      AND to_delete.min_id <> test.id
);

Finding duplicates on two columns (either column)

The article presents a table

a_b_c(a INT NOT NULL PRIMARY KEY AUTO_INCREMENT, b INT, c INT)

with data that contains duplicate values in column b and in column c, but no duplicate {b,c} pairs.

Simple GROUP BY b, c does not satisfy the requirement because it only finds rows where the pair repeats.

Incorrect attempts such as:

SELECT b, c, COUNT(*)
FROM a_b_c
GROUP BY b, c
HAVING COUNT(DISTINCT b) > 1 OR COUNT(DISTINCT c) > 1;

return all rows because the boolean expression is evaluated inside COUNT() and always yields 1.

Correct approaches

Method 1 – UNION of two single‑column queries (shows which column is duplicated):

SELECT b AS value, COUNT(*) AS cnt, 'b' AS what_col
FROM a_b_c
GROUP BY b
HAVING COUNT(*) > 1
UNION
SELECT c AS value, COUNT(*) AS cnt, 'c' AS what_col
FROM a_b_c
GROUP BY c
HAVING COUNT(*) > 1;

Method 2 – Nested sub‑queries to return the full rows that belong to a duplicated column:

SELECT a, b, c
FROM a_b_c
WHERE b IN (SELECT b FROM a_b_c GROUP BY b HAVING COUNT(*) > 1)
   OR c IN (SELECT c FROM a_b_c GROUP BY c HAVING COUNT(*) > 1);

Method 3 – Join with derived tables (more efficient for large data sets):

SELECT a, a_b_c.b, a_b_c.c
FROM a_b_c
LEFT OUTER JOIN (
    SELECT b FROM a_b_c GROUP BY b HAVING COUNT(*) > 1
) AS b ON a_b_c.b = b.b
LEFT OUTER JOIN (
    SELECT c FROM a_b_c GROUP BY c HAVING COUNT(*) > 1
) AS c ON a_b_c.c = c.c
WHERE b.b IS NOT NULL OR c.c IS NOT NULL;

These methods correctly list all rows that have duplicate values in either column b or column c. The article notes that the UNION approach is the simplest, while the join/sub‑query approach is usually more performant.

Illustration

Sorting by column b demonstrates how identical c values are split into different groups, which explains why COUNT(DISTINCT c) cannot be used after grouping by b:

The article concludes that several viable solutions exist, and the choice depends on readability and performance requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLmysqlduplicate rowsDELETEGROUP BYSubqueryHAVING
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.