Databases 10 min read

When to Use SELECT DISTINCT vs GROUP BY in MySQL: Performance & Use Cases

This article compares MySQL’s SELECT DISTINCT and GROUP BY clauses, explaining their syntax, functional differences, performance implications, and ideal scenarios through detailed examples, index usage analysis, and a feature comparison table, helping developers choose the right approach for deduplication or aggregation tasks.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
When to Use SELECT DISTINCT vs GROUP BY in MySQL: Performance & Use Cases

In everyday work, database queries are ubiquitous, and handling duplicate records and grouping summaries are common requirements.

MySQL provides two typical ways to manage and retrieve unique values: SELECT DISTINCT and GROUP BY. Although their output may appear similar, their purposes, performance characteristics, and usage scenarios differ.

SELECT DISTINCT

DISTINCT

is a keyword used for deduplication. The SELECT DISTINCT statement removes duplicate rows from the result set and returns only unique values, making it a simple and efficient way to obtain distinct data.

Basic Syntax

SELECT DISTINCT column1, column2
FROM table_name;

Parameter Description

column1, column2 : columns to retrieve.

table_name : name of the table to query.

Feature Description

DISTINCT

can deduplicate based on a single column or multiple columns; only rows with identical values across all selected columns are considered duplicates.

In DISTINCT, NULL is treated as a distinct value, so multiple NULL entries result in a single NULL in the output.

GROUP BY

GROUP BY

is a clause used for grouping rows, usually combined with aggregate functions to summarize grouped data. It partitions rows into groups based on the values of specified columns.

Basic Syntax

SELECT column1, aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column1, column2, ...;

Parameter Description

column1, column2 : columns used for grouping.

aggregate_function(column_name) : aggregate function applied to each group, e.g., COUNT, SUM, AVG.

table_name : name of the table to query.

condition : optional filter applied before grouping.

GROUP BY column1, column2... : defines the grouping fields; rows with the same values are placed in the same group.

Sample Table Structure and Data

Two tables are defined for illustration: customers and orders, with sample data inserted.

Creating Tables

CREATE TABLE customers (
  customer_id INT PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  city VARCHAR(255) NOT NULL
);

INSERT INTO customers (customer_id, name, city) VALUES
  (1, 'John Doe', 'New York'),
  (2, 'Jane Smith', 'London'),
  (3, 'Mike Brown', 'Paris'),
  (2, 'Jane Smith', 'London'); -- duplicate entry

CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT NOT NULL,
  product VARCHAR(255) NOT NULL,
  price DECIMAL(10,2) NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

INSERT INTO orders (order_id, customer_id, product, price) VALUES
  (1, 1, 'Phone', 100.00),
  (2, 2, 'Laptop', 500.00),
  (3, 1, 'Tablet', 200.00),
  (4, 2, 'Watch', 150.00);

SELECT DISTINCT vs GROUP BY Comparison

Example 1: Retrieve Unique Customer Cities

Goal: Get the distinct list of cities from the customers table.

SELECT DISTINCT city
FROM customers;

Output:

city
-----
New York
London
Paris

Example 2: Count Orders per City

Goal: Count the number of orders for each city, requiring grouping.

SELECT city, COUNT(*) AS order_count
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY city;

Output:

city         order_count
------------------------
London       2
New York     2

Performance Analysis

Both DISTINCT and GROUP BY involve grouping operations at the engine level, and in some cases they can be interchangeable. However, their performance may differ based on indexing and sorting behavior.

When an index exists on the column being deduplicated (e.g., int1_index), both queries can use the same execution plan, as shown by EXPLAIN output indicating Using index for group-by.

Prior to MySQL 8.0, GROUP BY performed an implicit sort, potentially adding a filesort operation and increasing overhead. In such cases, DISTINCT often outperforms GROUP BY. Starting with MySQL 8.0, the implicit sorting is removed, narrowing the performance gap, especially on large datasets without indexes.

Feature Comparison

Purpose : DISTINCT – deduplication; GROUP BY – grouping and aggregation.

Supports Aggregate Functions : No for DISTINCT; Yes for GROUP BY.

Sorting Behavior : Optional for DISTINCT; default sorting for GROUP BY (optimized in MySQL 8.0).

Performance : DISTINCT is more efficient without indexes; GROUP BY may be slower due to sorting when no index is present.

Syntax Complexity : Simple for DISTINCT; more complex for GROUP BY.

Applicable Scenarios

Use SELECT DISTINCT when you only need to remove duplicates and return unique values; suitable for simple queries.

Use GROUP BY when you need to group rows by specific columns and perform aggregation (e.g., COUNT, SUM, AVG) or apply HAVING filters; ideal for complex business logic.

Conclusion

SELECT DISTINCT

and GROUP BY are powerful tools for different SQL query needs. DISTINCT excels at simple deduplication, while GROUP BY shines in grouping and aggregating data. After MySQL 8.0, the performance gap narrows, but GROUP BY offers greater flexibility for complex scenarios. Choose the clause that best fits your specific use case.

MySQLDeduplicationSQL performanceaggregationGROUP BYSELECT DISTINCT
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.