Databases 4 min read

Three Methods for SQL Deduplication: DISTINCT, GROUP BY, and ROW_NUMBER

This article explains three common SQL deduplication techniques—using DISTINCT, GROUP BY, and the ROW_NUMBER window function—illustrating their syntax, performance considerations, and example queries for counting unique task identifiers in relational databases.

Laravel Tech Community
Laravel Tech Community
Laravel Tech Community
Three Methods for SQL Deduplication: DISTINCT, GROUP BY, and ROW_NUMBER

SQL deduplication is a common scenario in data analysis, especially when calculating unique visitors (UV) from tables that contain duplicate rows.

In MySQL, the typical approaches are SELECT DISTINCT and GROUP BY . For databases that support window functions (e.g., Hive, Oracle), the ROW_NUMBER() function can also be used.

DISTINCT

-- List all unique task_id values
-- select distinct task_id
-- from Task;

-- Count distinct task_id
select count(distinct task_id) as task_num
from Task;

The DISTINCT clause removes duplicate rows across all selected columns; it is often less efficient and is mainly used together with COUNT to obtain the number of unique values.

GROUP BY

-- List unique task_id values (including NULL)
select task_id
from Task
group by task_id;

-- Count task_id after grouping
select count(task_id) as task_num
from (
    select task_id
    from Task
    group by task_id
) tmp;

GROUP BY groups rows by the specified column(s) and can be combined with aggregation functions; it returns one row per distinct group.

ROW_NUMBER (Window Function)

select count(case when rn = 1 then task_id else null end) as task_num
from (
    select task_id,
           row_number() over (partition by task_id order by start_time) as rn
    from Task
) tmp;

The ROW_NUMBER() window function assigns a sequential number within each partition; keeping only rows where the number equals 1 effectively deduplicates the data.

Additional examples using a test table illustrate the differences between DISTINCT and GROUP BY when selecting single or multiple columns.

SQLDeduplicationdatabaseswindow functionsGROUP BYDISTINCTROW_NUMBER
Laravel Tech Community
Written by

Laravel Tech Community

Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.