Databases 11 min read

How to Find and Remove Duplicate Rows in SQL Tables

This guide explains how to define duplicate rows, use GROUP BY and HAVING to locate them, and apply temporary tables and DELETE statements to remove duplicates while preserving the row with the smallest ID.

Liangxu Linux
Liangxu Linux
Liangxu Linux
How to Find and Remove Duplicate Rows in SQL Tables

First, define a duplicate row as rows sharing the same value in a specific column; most often this is a single column like day. The article creates a sample table test with an id primary key and a day column, inserts sample data, and shows the initial content.

Finding Duplicate Rows

Use GROUP BY on the target column and count the rows in each group. Rows whose group size is greater than 1 are duplicates:

select day, count(*) from test GROUP BY day HAVING count(*) > 1;

This returns the day values that appear more than once.

Deleting Duplicate Rows

To keep only the first (smallest id) row in each duplicate group, create a temporary table that stores each day and the minimum id:

create temporary table to_delete (day date not null, min_id int not null);
insert into to_delete(day, min_id)
  select day, MIN(id) from test group by day having count(*) > 1;
select * from to_delete;

Then delete rows that match the day but have a different id:

delete from test
where exists (
  select * from to_delete
  where to_delete.day = test.day and to_delete.min_id <> test.id
);

Finding Duplicates Across Multiple Columns

When you need to detect duplicates on either column b or c (or both), simple GROUP BY b, c is insufficient because it groups by the pair, not each column individually. Several correct approaches are presented:

Union two separate queries, one for each column, and add a marker column to indicate which column has duplicates:

select b as value, count(*) as cnt, 'b' as what_col from a_b_c group by b having count(*) > 1
union
select c as value, count(*) as cnt, 'c' as what_col from a_b_c group by c having count(*) > 1;

Use nested subqueries to filter rows whose b or c appears in a duplicated group:

select a, b, c from a_b_c
where b in (select b from a_b_c group by b having count(*) > 1)
   or c in (select c from a_b_c group by c having count(*) > 1);

Join the table with two derived tables that list duplicated b and duplicated c values, then keep rows where either join succeeds:

select a, a_b_c.b, a_b_c.c
from a_b_c
left outer join (select b from a_b_c group by b having count(*) > 1) as b on a_b_c.b = b.b
left outer join (select c from a_b_c group by c having count(*) > 1) as c on a_b_c.c = c.c
where b.b is not null or c.c is not null;

These methods vary in performance; the UNION approach is usually the simplest and fastest, while the join method can be more flexible for complex queries.

Common Pitfalls

Do not place the condition > 1 inside the COUNT() function, as in count(distinct b > 1); this always evaluates to 1 and returns all rows. The correct syntax is HAVING count(*) > 1 or HAVING count(distinct b) > 1 when appropriate.

The article also warns that using WHERE filters before grouping will not work for duplicate detection; the filter must be applied after grouping with HAVING.

Summary

Detecting duplicate rows in SQL involves grouping by the relevant column(s) and using HAVING count(*) > 1. To delete duplicates while retaining the earliest row, identify the minimum primary key per group via a temporary table and delete the others. For multi‑column duplicate detection, either union separate column queries, use nested subqueries, or join derived tables, choosing the method that best fits performance and readability requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLduplicate rowsDELETEtemporary tableGROUP BYHAVINGmulti‑column duplicates
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.