Why Does MySQL GROUP BY Slow Down and How to Fix It
This article explains why a MySQL GROUP BY query can degrade from seconds to minutes as data grows, shows how to locate the bottlenecks with EXPLAIN and profiling tools, and provides practical indexing, query rewriting, temporary‑table tuning, batch aggregation, and distributed‑database strategies to restore performance.
Introduction
Many developers encounter a situation where a previously fast GROUP BY query becomes minutes or hours as data grows, causing slow page loads and DBA frustration.
Why does GROUP BY become slower?
GROUP BY consists of data reading and grouping stages. Lack of indexes leads to full table scans, and large data sets force MySQL to use temporary tables on disk.
How to locate performance problems
1. Use EXPLAIN
EXPLAIN shows type, key, rows, and Extra information such as Using temporary or Using filesort.
EXPLAIN
SELECT department, COUNT(*) as emp_count
FROM employees
WHERE hire_date > '2020-01-01'
GROUP BY department;2. Performance monitoring tools
Enable profiling and view detailed execution steps.
-- 开启性能分析
SET PROFILING = 1;
SELECT department, COUNT(*) as emp_count
FROM employees
WHERE hire_date > '2020-01-01'
GROUP BY department;
SHOW PROFILE FOR QUERY 1;
SHOW PROFILES;Common causes and solutions
1. Missing suitable indexes
Problem: No composite index on GROUP BY and WHERE columns.
Solution: Create composite indexes.
CREATE INDEX idx_department_hire_date ON employees(department, hire_date);
CREATE INDEX idx_department_hire_date_covering ON employees(department, hire_date, salary);Place WHERE columns on the left side of the index.
Then GROUP BY columns.
Finally SELECT columns for a covering index.
2. Temporary tables and filesort
When data is large, MySQL may create disk‑based temporary tables, drastically slowing the query.
Solution 1: Increase tmp_table_size and max_heap_table_size.
SHOW VARIABLES LIKE 'tmp_table_size';
SHOW VARIABLES LIKE 'max_heap_table_size';
SET GLOBAL tmp_table_size = 256*1024*1024; -- 256MB
SET GLOBAL max_heap_table_size = 256*1024*1024; -- 256MBSolution 2: Optimize the query to select only needed columns.
SELECT department, COUNT(*) as emp_count
FROM employees
WHERE hire_date > '2023-01-01'
GROUP BY department;3. Large data volume
Even with indexes, billions of rows can be slow.
Method 1: Batch aggregation.
public Map<String, Integer> batchGroupBy(String tableName, String groupColumn, String condition, int batchSize) throws SQLException {
// implementation that queries in batches and merges results
}Method 2: Asynchronous processing with cache.
@Async("taskExecutor")
public CompletableFuture<Map<String, Integer>> executeGroupByAsync(String sql, String cacheKey) {
// check cache, execute query, store result
}4. Complex GROUP BY
Complex queries with sub‑queries or many columns perform poorly.
Solution 1: Use derived tables.
SELECT t.department, t.avg_salary, t.emp_count
FROM (
SELECT department, AVG(salary) as avg_salary, COUNT(*) as emp_count
FROM employees
WHERE hire_date > '2020-01-01'
GROUP BY department
) t
WHERE t.avg_salary > 5000;Solution 2: WITH ROLLUP for multi‑dimensional grouping.
SELECT department, job_title, COUNT(*) as emp_count
FROM employees
GROUP BY department, job_title WITH ROLLUP;5. Distributed environment
In sharding scenarios, GROUP BY must run on each shard and merge results.
Method 1: Middleware to execute across shards.
public Map<String, Integer> executeAcrossShards(String logicSql, List<DataSource> shards) {
// run logicSql on each shard concurrently and merge maps
}Method 2: Use Elasticsearch for heavy aggregations.
SearchRequest searchRequest = new SearchRequest("employees");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
TermsAggregationBuilder aggregation = AggregationBuilders.terms("by_department")
.field("department.keyword")
.subAggregation(AggregationBuilders.avg("avg_salary").field("salary"));
sourceBuilder.aggregation(aggregation);
searchRequest.source(sourceBuilder);
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
// process responsePractical case
In an e‑commerce system, a GROUP BY query on orders became slow.
Original query:
SELECT DATE(create_time) as order_date, product_category,
COUNT(*) as order_count, SUM(amount) as total_amount
FROM orders
WHERE create_time >= '2023-01-01' AND status = 'COMPLETED'
GROUP BY DATE(create_time), product_category;Solutions:
Create a composite index on (create_time, status, product_category, amount).
Use a pre‑aggregated daily stats table updated by a scheduled job.
Query the pre‑aggregated table for fast results.
CREATE INDEX idx_orders_stats ON orders(create_time, status, product_category, amount);
CREATE TABLE orders_daily_stats (
stat_date DATE NOT NULL,
product_category VARCHAR(50) NOT NULL,
order_count INT NOT NULL,
total_amount DECIMAL(15,2) NOT NULL,
PRIMARY KEY (stat_date, product_category)
);
INSERT INTO orders_daily_stats
SELECT DATE(create_time), product_category, COUNT(*), SUM(amount)
FROM orders
WHERE create_time >= CURDATE() - INTERVAL 1 DAY AND status = 'COMPLETED'
GROUP BY DATE(create_time), product_category
ON DUPLICATE KEY UPDATE
order_count = VALUES(order_count),
total_amount = VALUES(total_amount);
-- Fast query
SELECT stat_date, product_category, order_count, total_amount
FROM orders_daily_stats
WHERE stat_date >= '2023-01-01';Conclusion
Key take‑aways: create proper composite indexes, simplify queries to avoid SELECT *, tune temporary‑table memory, apply batch processing or pre‑aggregation for massive data, consider sharding or search‑engine solutions for distributed workloads, and upgrade architecture with read‑write splitting or dedicated analytics stores.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
