Databases 9 min read

DISTINCT vs GROUP BY in MySQL: Performance, Index Usage, and Implicit Sorting

This article explains the functional differences between DISTINCT and GROUP BY in MySQL, analyzes their performance with and without indexes, describes how implicit sorting affects GROUP BY before MySQL 8.0, and provides practical recommendations for choosing the appropriate clause.

Selected Java Interview Questions
Selected Java Interview Questions
Selected Java Interview Questions
DISTINCT vs GROUP BY in MySQL: Performance, Index Usage, and Implicit Sorting

The article compares DISTINCT and GROUP BY in MySQL, focusing on their performance characteristics, index utilization, and the impact of implicit sorting.

Conclusion (full conclusion appears at the end of the original text):

When the semantics are identical and an index exists, both GROUP BY and DISTINCT can use the index and have comparable efficiency.

When the semantics are identical but no index is available, DISTINCT is generally faster because GROUP BY may perform an extra sorting step (filesort) that degrades execution speed.

The article then walks through the basic usage of DISTINCT with single‑column and multi‑column examples:

SELECT DISTINCT columns FROM table_name WHERE where_conditions;

Example output:

mysql> select distinct age from student;
+------+
| age  |
+------+
| 10   |
| 12   |
| 11   |
| NULL |
+------+
4 rows in set (0.01 sec)

For multi‑column distinct:

SELECT DISTINCT column1, column2 FROM table_name WHERE where_conditions;
mysql> select distinct sex, age from student;
+--------+------+
| sex    | age  |
+--------+------+
| male   | 10   |
| female | 12   |
| male   | 11   |
| male   | NULL |
| female | 11   |
+--------+------+
5 rows in set (0.02 sec)

The GROUP BY syntax is then presented, showing that it can achieve the same deduplication effect while also supporting aggregation and more complex processing:

SELECT columns FROM table_name WHERE where_conditions GROUP BY columns;

Single‑column example:

mysql> select age from student group by age;
+------+
| age  |
+------+
| 10   |
| 12   |
| 11   |
| NULL |
+------+
4 rows in set (0.02 sec)

Multi‑column example:

mysql> select sex, age from student group by sex, age;
+--------+------+
| sex    | age  |
+--------+------+
| male   | 10   |
| female | 12   |
| male   | 11   |
| male   | NULL |
| female | 11   |
+--------+------+
5 rows in set (0.03 sec)

Both DISTINCT and GROUP BY are implemented via a grouping operation and can use index scans (loose or covering index scans). The article shows EXPLAIN output confirming that MySQL uses the "Using index for group‑by" optimization for both statements.

Before MySQL 8.0, GROUP BY performed an implicit sort on the grouping columns, which could trigger a temporary table and a filesort, dramatically reducing performance when no suitable index existed. The article cites MySQL documentation that explains this behavior.

MySQL 8.0 removed the implicit sorting, so the performance gap between DISTINCT and GROUP BY disappears when no index is present; both become nearly equivalent.

Finally, the article recommends using GROUP BY when you need clearer semantics, the ability to apply HAVING filters, or aggregate functions, while acknowledging that DISTINCT remains a concise way to obtain unique rows when no further processing is required.

PerformanceSQLDatabaseMySQLIndexGROUP BYDISTINCT
Selected Java Interview Questions
Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.