Databases 11 min read

How to Find and Remove Duplicate Rows in SQL

This guide explains how to identify duplicate rows in a MySQL table using GROUP BY and HAVING, how to delete the extra rows while keeping the earliest entry, and how to handle duplicate detection across multiple columns with correct query patterns and common pitfalls.

Liangxu Linux
Liangxu Linux
Liangxu Linux
How to Find and Remove Duplicate Rows in SQL

Finding duplicate rows

First define a duplicate row as having the same value in a specific column. The example creates a test table with id and day columns, inserts sample data, and shows the raw data:

create table test(
  id int not null primary key,
  day date not null
);

insert into test(id, day) values (1, '2006-10-08');
insert into test(id, day) values (2, '2006-10-08');
insert into test(id, day) values (3, '2006-10-09');

select * from test;

Rows with the same day value are considered duplicates. Use GROUP BY to group by the column and COUNT(*) to see group sizes:

select day, count(*)
from test
group by day;

To keep only groups that appear more than once, add a HAVING clause:

select day, count(*)
from test
group by day
having count(*) > 1;

The HAVING clause filters after grouping, unlike WHERE which filters before grouping.

Deleting duplicate rows

Assume we want to keep the row with the smallest id in each duplicate group and delete the others. A temporary table can store the day and the minimum id to keep:

create temporary table to_delete(
  day date not null,
  min_id int not null
);

insert into to_delete(day, min_id)
select day, MIN(id)
from test
group by day
having count(*) > 1;

select * from to_delete;

Then delete rows from the original table that match the day but have a different id:

delete from test
where exists (
  select *
  from to_delete
  where to_delete.day = test.day
    and to_delete.min_id <> test.id
);

Finding duplicate rows on multiple columns

When you need to detect duplicates based on a combination of columns (e.g., b and c), simple GROUP BY b, c may not work if you also need to know which individual column has duplicates. Several approaches are presented:

Union two separate queries that find duplicates on each column and label the source column.

Use a nested sub‑query to filter rows where either column appears in a duplicate group.

Join the original table with two derived tables that list duplicate values for each column.

Example using UNION:

select b as value, count(*) as cnt, 'b' as what_col
from a_b_c
group by b
having count(*) > 1
union
select c as value, count(*) as cnt, 'c' as what_col
from a_b_c
group by c
having count(*) > 1;

Example using nested sub‑queries:

select a, b, c
from a_b_c
where b in (select b from a_b_c group by b having count(*) > 1)
   or c in (select c from a_b_c group by c having count(*) > 1);

Example using left joins on derived tables:

select a, a_b_c.b, a_b_c.c
from a_b_c
left outer join (
  select b from a_b_c group by b having count(*) > 1
) as b on a_b_c.b = b.b
left outer join (
  select c from a_b_c group by c having count(*) > 1
) as c on a_b_c.c = c.c
where b.b is not null or c.c is not null;

Common pitfalls

Placing the > 1 condition inside COUNT() (e.g., count(distinct b) > 1) does not work because the comparison is evaluated before aggregation, resulting in every row being returned. The correct form is to place the comparison after COUNT() in the HAVING clause.

Summary of techniques

The most straightforward method is to use GROUP BY with HAVING count(*) > 1 for each column of interest and combine results with UNION. For more complex scenarios or performance‑critical queries, nested sub‑queries or join‑based approaches can be employed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLmysqlduplicate rowsDELETEGROUP BYHAVING
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.