When to Use SELECT DISTINCT vs GROUP BY in MySQL: Performance & Use Cases
This article compares MySQL’s SELECT DISTINCT and GROUP BY clauses, explaining their syntax, functional differences, performance implications, and ideal scenarios through detailed examples, index usage analysis, and a feature comparison table, helping developers choose the right approach for deduplication or aggregation tasks.
In everyday work, database queries are ubiquitous, and handling duplicate records and grouping summaries are common requirements.
MySQL provides two typical ways to manage and retrieve unique values: SELECT DISTINCT and GROUP BY. Although their output may appear similar, their purposes, performance characteristics, and usage scenarios differ.
SELECT DISTINCT
DISTINCTis a keyword used for deduplication. The SELECT DISTINCT statement removes duplicate rows from the result set and returns only unique values, making it a simple and efficient way to obtain distinct data.
Basic Syntax
SELECT DISTINCT column1, column2
FROM table_name;Parameter Description
column1, column2 : columns to retrieve.
table_name : name of the table to query.
Feature Description
DISTINCTcan deduplicate based on a single column or multiple columns; only rows with identical values across all selected columns are considered duplicates.
In DISTINCT, NULL is treated as a distinct value, so multiple NULL entries result in a single NULL in the output.
GROUP BY
GROUP BYis a clause used for grouping rows, usually combined with aggregate functions to summarize grouped data. It partitions rows into groups based on the values of specified columns.
Basic Syntax
SELECT column1, aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column1, column2, ...;Parameter Description
column1, column2 : columns used for grouping.
aggregate_function(column_name) : aggregate function applied to each group, e.g., COUNT, SUM, AVG.
table_name : name of the table to query.
condition : optional filter applied before grouping.
GROUP BY column1, column2... : defines the grouping fields; rows with the same values are placed in the same group.
Sample Table Structure and Data
Two tables are defined for illustration: customers and orders, with sample data inserted.
Creating Tables
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
city VARCHAR(255) NOT NULL
);
INSERT INTO customers (customer_id, name, city) VALUES
(1, 'John Doe', 'New York'),
(2, 'Jane Smith', 'London'),
(3, 'Mike Brown', 'Paris'),
(2, 'Jane Smith', 'London'); -- duplicate entry
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT NOT NULL,
product VARCHAR(255) NOT NULL,
price DECIMAL(10,2) NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
INSERT INTO orders (order_id, customer_id, product, price) VALUES
(1, 1, 'Phone', 100.00),
(2, 2, 'Laptop', 500.00),
(3, 1, 'Tablet', 200.00),
(4, 2, 'Watch', 150.00);SELECT DISTINCT vs GROUP BY Comparison
Example 1: Retrieve Unique Customer Cities
Goal: Get the distinct list of cities from the customers table.
SELECT DISTINCT city
FROM customers;Output:
city
-----
New York
London
ParisExample 2: Count Orders per City
Goal: Count the number of orders for each city, requiring grouping.
SELECT city, COUNT(*) AS order_count
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY city;Output:
city order_count
------------------------
London 2
New York 2Performance Analysis
Both DISTINCT and GROUP BY involve grouping operations at the engine level, and in some cases they can be interchangeable. However, their performance may differ based on indexing and sorting behavior.
When an index exists on the column being deduplicated (e.g., int1_index), both queries can use the same execution plan, as shown by EXPLAIN output indicating Using index for group-by.
Prior to MySQL 8.0, GROUP BY performed an implicit sort, potentially adding a filesort operation and increasing overhead. In such cases, DISTINCT often outperforms GROUP BY. Starting with MySQL 8.0, the implicit sorting is removed, narrowing the performance gap, especially on large datasets without indexes.
Feature Comparison
Purpose : DISTINCT – deduplication; GROUP BY – grouping and aggregation.
Supports Aggregate Functions : No for DISTINCT; Yes for GROUP BY.
Sorting Behavior : Optional for DISTINCT; default sorting for GROUP BY (optimized in MySQL 8.0).
Performance : DISTINCT is more efficient without indexes; GROUP BY may be slower due to sorting when no index is present.
Syntax Complexity : Simple for DISTINCT; more complex for GROUP BY.
Applicable Scenarios
Use SELECT DISTINCT when you only need to remove duplicates and return unique values; suitable for simple queries.
Use GROUP BY when you need to group rows by specific columns and perform aggregation (e.g., COUNT, SUM, AVG) or apply HAVING filters; ideal for complex business logic.
Conclusion
SELECT DISTINCTand GROUP BY are powerful tools for different SQL query needs. DISTINCT excels at simple deduplication, while GROUP BY shines in grouping and aggregating data. After MySQL 8.0, the performance gap narrows, but GROUP BY offers greater flexibility for complex scenarios. Choose the clause that best fits your specific use case.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
