Databases 15 min read

How MySQL Uses Indexes for GROUP BY: Compact vs Loose Scans Explained

This article explains how MySQL can implement GROUP BY using either a compact index scan that reads rows sequentially or a loose index scan that jumps to the needed rows, compares their costs, and shows when each method is optimal, including a hybrid sequential‑loose scan.

Su San Talks Tech

Apr 14, 2022

How MySQL Uses Indexes for GROUP BY: Compact vs Loose Scans Explained

1. Introduction

When analyzing a SELECT with GROUP BY using EXPLAIN, the Extra column may show Using temporary; Using filesort, indicating that MySQL creates a temporary table for the GROUP BY. Using an index to satisfy the grouping can avoid this overhead.

2. Compact Index Scan

If the GROUP BY columns are part of an index and satisfy the left‑most prefix rule, the server can read the index sequentially without a temporary table. This is called a compact index scan . It processes every row that matches the WHERE clause and applies the aggregation functions.

Example SQL:

CREATE TABLE `t_group_by` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `i1` int(10) unsigned DEFAULT '0',
  `c1` char(11) DEFAULT '',
  `e1` enum('北京','上海','广州','深圳','天津','杭州','成都','重庆','苏州','南京','洽尔滨','沈阳','长春','厦门','福州','南昌','泉州','德清','长沙','武汉') DEFAULT '北京',
  `d1` decimal(10,2) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_e1_i1` (`e1`,`i1`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Example using AVG():

select e1, avg(i1) as t
from t_group_by
where d1 > 5452415
group by e1;

During lexical and syntactic analysis, avg(i1) is parsed into an Item_sum_avg object with three key attributes: sum (stores the group sum), count (stores the group count), and args (the argument list, where args[0] is the i1 field).

In the execution phase, the server reads each record, checks the WHERE condition, and when the group prefix (the GROUP BY column value) changes, it finalizes the previous group (computes sum/count) and starts a new group, resetting sum and count.

bool Item_sum_avg::add() {
  // group sum
  if (Item_sum_sum::add())
    return TRUE;
  // group count (only when value is not NULL)
  if (!aggr->arg_is_null(true))
    count++;
  return FALSE;
}

The same logic applies to COUNT() and SUM(), differing only in which attributes they maintain.

3. Loose Index Scan

A loose index scan jumps over rows: for each group it reads the first (or last) matching row to obtain the group prefix, then directly reads the needed value (e.g., MIN or MAX) without scanning all rows in the group. This reduces the number of rows read and appears in EXPLAIN as Using index for group-by.

Example using MIN():

select e1, min(i1)
from t_group_by
group by e1;

During analysis, min(i1) becomes an Item_sum_min object with value (stores the minimum) and args (the argument list).

Execution consists of two steps for each group: (1) read the group prefix (the e1 value) by fetching the first row of the group; (2) using that prefix, read the row that contains the minimum i1 value.

4. Choosing Between the Two Scans

Loose scans are beneficial only when certain conditions are met:

Single‑table query (no joins).

GROUP BY columns satisfy the left‑most index rule.

All aggregation functions use the same column, which must also be part of the index, and the combination of GROUP BY columns plus the aggregation column must satisfy the left‑most rule.

Supported aggregation functions are either 1‑2 of MIN() / MAX() or 1‑3 of COUNT(DISTINCT), SUM(DISTINCT), AVG(DISTINCT).

The index must be a full‑column index (no prefix indexes).

If these conditions hold, MySQL estimates the cost of both scans and chooses the cheaper one. When a loose scan is more expensive, MySQL falls back to a compact scan for MIN() / MAX(), while for distinct aggregations it may use a hybrid approach called “sequential loose index scan”, which reads rows sequentially like a compact scan but retains the built‑in deduplication of a loose scan. This appears in EXPLAIN as Using index for group-by (scanning).

5. Summary

The article covered two MySQL index‑based implementations of GROUP BY: compact index scan (sequential read) and loose index scan (jumping read). It detailed the internal classes ( Item_sum_avg, Item_sum_min, etc.), showed code snippets for aggregation handling, explained the conditions under which a loose scan can be used, compared their costs, and introduced the sequential loose scan variant that combines the advantages of both methods.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance mysql GROUP BY loose index scan compact index scan

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.