When to Use DISTINCT vs GROUP BY in MySQL: Performance Insights
This article compares MySQL's DISTINCT and GROUP BY clauses, detailing when they deliver identical performance with indexed columns, why DISTINCT can be faster without indexes, how each handles NULLs and multi‑column deduplication, and provides practical syntax examples and optimization guidance.
Conclusion
When the semantics are the same and an index exists, GROUP BY and DISTINCT can both use the index and have identical efficiency. When no index is available, DISTINCT is generally faster because GROUP BY may trigger a filesort after an implicit sorting step.
DISTINCT Usage
The basic syntax is:
SELECT DISTINCT columns FROM table_name WHERE where_conditions;Example: SELECT DISTINCT age FROM student; MySQL keeps a single NULL value when DISTINCT is applied to a column containing NULL s, treating all NULL s as equal.
Multi‑column deduplication works only when all specified columns have identical values:
SELECT DISTINCT column1, column2 FROM table_name WHERE where_conditions;Example:
SELECT DISTINCT sex, age FROM student;GROUP BY Usage
Single‑column grouping syntax:
SELECT columns FROM table_name WHERE where_conditions GROUP BY columns;Example: SELECT age FROM student GROUP BY age; Multi‑column grouping uses the same pattern with multiple columns listed after GROUP BY:
SELECT columns FROM table_name WHERE where_conditions GROUP BY column1, column2;Example: SELECT sex, age FROM student GROUP BY sex, age; Unlike DISTINCT, GROUP BY can be followed by HAVING or aggregate functions for more complex data processing.
Principles Behind DISTINCT and GROUP BY
Both statements are based on a grouping operation and can be executed using index scans (range, loose index scan, or index‑only scan). In most cases, DISTINCT can be viewed as a special form of GROUP BY, and the same index‑optimization techniques apply to both.
Before MySQL 8.0, GROUP BY performed an implicit sort when no index could satisfy the ordering, which often caused a filesort and reduced performance. MySQL 8.0 removed this implicit sorting, so the performance gap between GROUP BY and DISTINCT disappears when no index is present.
Why Prefer GROUP BY
GROUP BYexpresses intent more clearly for aggregation and grouping tasks.
It allows more complex processing, such as HAVING filters and aggregate functions.
When combined with indexes, it avoids the extra sorting step that DISTINCT may incur.
In summary, use GROUP BY for clearer semantics and advanced data manipulation, and use DISTINCT for simple deduplication when you only need unique rows.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
