How to Find and Remove Duplicate Rows in SQL
This guide explains how to identify duplicate rows in a MySQL table using GROUP BY and HAVING, how to delete the extra rows while keeping the earliest entry, and how to handle duplicate detection across multiple columns with correct query patterns and common pitfalls.
Finding duplicate rows
First define a duplicate row as having the same value in a specific column. The example creates a test table with id and day columns, inserts sample data, and shows the raw data:
create table test(
id int not null primary key,
day date not null
);
insert into test(id, day) values (1, '2006-10-08');
insert into test(id, day) values (2, '2006-10-08');
insert into test(id, day) values (3, '2006-10-09');
select * from test;Rows with the same day value are considered duplicates. Use GROUP BY to group by the column and COUNT(*) to see group sizes:
select day, count(*)
from test
group by day;To keep only groups that appear more than once, add a HAVING clause:
select day, count(*)
from test
group by day
having count(*) > 1;The HAVING clause filters after grouping, unlike WHERE which filters before grouping.
Deleting duplicate rows
Assume we want to keep the row with the smallest id in each duplicate group and delete the others. A temporary table can store the day and the minimum id to keep:
create temporary table to_delete(
day date not null,
min_id int not null
);
insert into to_delete(day, min_id)
select day, MIN(id)
from test
group by day
having count(*) > 1;
select * from to_delete;Then delete rows from the original table that match the day but have a different id:
delete from test
where exists (
select *
from to_delete
where to_delete.day = test.day
and to_delete.min_id <> test.id
);Finding duplicate rows on multiple columns
When you need to detect duplicates based on a combination of columns (e.g., b and c), simple GROUP BY b, c may not work if you also need to know which individual column has duplicates. Several approaches are presented:
Union two separate queries that find duplicates on each column and label the source column.
Use a nested sub‑query to filter rows where either column appears in a duplicate group.
Join the original table with two derived tables that list duplicate values for each column.
Example using UNION:
select b as value, count(*) as cnt, 'b' as what_col
from a_b_c
group by b
having count(*) > 1
union
select c as value, count(*) as cnt, 'c' as what_col
from a_b_c
group by c
having count(*) > 1;Example using nested sub‑queries:
select a, b, c
from a_b_c
where b in (select b from a_b_c group by b having count(*) > 1)
or c in (select c from a_b_c group by c having count(*) > 1);Example using left joins on derived tables:
select a, a_b_c.b, a_b_c.c
from a_b_c
left outer join (
select b from a_b_c group by b having count(*) > 1
) as b on a_b_c.b = b.b
left outer join (
select c from a_b_c group by c having count(*) > 1
) as c on a_b_c.c = c.c
where b.b is not null or c.c is not null;Common pitfalls
Placing the > 1 condition inside COUNT() (e.g., count(distinct b) > 1) does not work because the comparison is evaluated before aggregation, resulting in every row being returned. The correct form is to place the comparison after COUNT() in the HAVING clause.
Summary of techniques
The most straightforward method is to use GROUP BY with HAVING count(*) > 1 for each column of interest and combine results with UNION. For more complex scenarios or performance‑critical queries, nested sub‑queries or join‑based approaches can be employed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
