How MySQL Chooses the Cheapest Index for COUNT(*) and When It Gets It Wrong
This article examines whether SELECT COUNT(*) causes full‑table scans, explains MySQL’s optimizer cost‑based index selection (including IO and CPU costs), demonstrates with a 100k‑row table how auxiliary indexes are chosen, and shows cases where the optimizer’s estimates mislead performance.
Many wonder if SELECT COUNT(*) without a WHERE clause forces a full‑table scan. MySQL 5.6+ can optimize such queries by picking the cheapest auxiliary index, making COUNT(*) as fast as possible.
SQL Index Cost Calculation
The optimizer evaluates two main costs:
IO cost : reading a data page from disk, defaulted to 1 per page. MySQL reads whole pages, not individual rows, following the principle of locality.
CPU cost : processing rows after they are in memory, defaulted to 0.2 per row.
Example Demonstration
We create a table person (MySQL 5.7.18) with a primary key and two secondary indexes: name_score and create_time.
CREATE TABLE `person` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`score` int(11) NOT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `name_score` (`name`(191),`score`),
KEY `create_time` (`create_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;We insert 100,000 rows via a stored procedure:
CREATE PROCEDURE insert_person()
BEGIN
DECLARE c_id INT DEFAULT 1;
WHILE c_id <= 100000 DO
INSERT INTO person VALUES (c_id, CONCAT('name',c_id), c_id+100,
DATE_SUB(NOW(), INTERVAL c_id SECOND));
SET c_id = c_id + 1;
END WHILE;
END;Running EXPLAIN SELECT COUNT(*) FROM person shows MySQL uses the create_time auxiliary index:
EXPLAIN SELECT COUNT(*) FROM personWhen we query with conditions that could use either index, MySQL chooses a full‑table scan:
SELECT * FROM person WHERE NAME > 'name84059' AND create_time > '2020-05-23 14:39:18';Even a covering‑index query still results in a full scan:
SELECT create_time FROM person WHERE NAME > 'name84059' AND create_time > '2020-05-23 14:39:18';Execution times show the forced index is twice as fast (2 ms vs 4 ms), indicating the optimizer’s cost estimate was off.
We compute the full‑scan cost manually:
Rows ≈ 100,264 → CPU cost = 100,264 × 0.2 = 20,052.8
Data length 5,783,552 bytes → pages = 5,783,552 / 16 KB ≈ 353 → IO cost = 353
Total cost ≈ 20,406
Using optimizer_trace we see the estimated costs:
{
"index": "name_score",
"rows": 25372,
"cost": 30447
} {
"index": "create_time",
"rows": 50132,
"cost": 60159
} {
"access_type": "scan",
"rows_to_scan": 100264,
"cost": 20406,
"chosen": true
}The optimizer correctly picks the lowest estimated cost (full scan), but actual runtime shows the forced index is faster, highlighting inaccuracies in statistics or cost modeling.
Conclusion
The optimizer’s plan is not always optimal; inaccurate row statistics or cost formulas can lead to sub‑optimal choices. In production, use EXPLAIN and optimizer_trace to verify and tune queries, especially when multiple indexes are available.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
