Why MySQL’s Index Choice Can Miss the Best Plan – A Deep Cost Analysis
This article examines how MySQL estimates execution costs for queries, explains why COUNT(*) can be optimal, demonstrates index selection with real‑world examples, and shows how optimizer_trace can reveal mismatches between estimated and actual performance.
Introduction
Using the following SQL statements often leads to slow queries:
<code>SELECT COUNT(*) FROM SomeTable
SELECT COUNT(1) FROM SomeTable</code>Although many claim that a full‑table scan is inevitable, MySQL actually optimizes COUNT(*) without a
WHEREclause by choosing the cheapest auxiliary index, which can be the most performant option in versions 5.6 and later.
How MySQL Calculates Index Execution Cost
When multiple indexes exist, MySQL selects the one with the lowest estimated cost, which consists of two components:
IO cost – reading data pages from disk; each page read costs 1 unit.
CPU cost – evaluating rows and sorting; default cost per row is 0.2.
Example Setup
We create a table with a primary key and two secondary indexes (
name_scoreand
create_time) and insert 100,000 rows via a stored procedure.
<code>CREATE TABLE `person` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`score` int(11) NOT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `name_score` (`name`(191),`score`),
KEY `create_time` (`create_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE PROCEDURE insert_person()
BEGIN
DECLARE c_id integer DEFAULT 1;
WHILE c_id<=100000 DO
INSERT INTO person VALUES (c_id, CONCAT('name',c_id), c_id+100, DATE_SUB(NOW(), INTERVAL c_id SECOND));
SET c_id=c_id+1;
END WHILE;
END</code>After populating the table, we run
EXPLAIN SELECT COUNT(*) FROM personand observe that MySQL uses the
create_timeauxiliary index, confirming the optimizer’s cost‑based choice.
When MySQL Chooses a Full Table Scan
Running a query with both
NAMEand
create_timeconditions results in a full table scan, even though both secondary indexes could be used. Forcing the
create_timeindex reduces execution time from 4 ms to 2 ms, showing that the optimizer’s cost estimate was inaccurate.
<code>-- Full table scan time: 4.0 ms
SELECT create_time FROM person WHERE NAME > 'name84059' AND create_time > '2020-05-23 14:39:18';
-- Using covering index time: 2.0 ms
SELECT create_time FROM person FORCE INDEX(create_time) WHERE NAME > 'name84059' AND create_time > '2020-05-23 14:39:18';</code>The cost of a full scan is derived from the estimated row count (CPU cost) and the number of data pages (IO cost). Using
SHOW TABLE STATUS LIKE 'person'we find 100,264 rows (CPU cost ≈ 20,052.8) and 353 pages (IO cost = 353), giving a total cost of 20,406.
Using optimizer_trace to Verify Costs
Enabling
optimizer_traceshows detailed cost estimates for each access path. The
name_scoreindex has a cost of 30,447, the
create_timeindex 60,159, while the full scan remains at 20,406, confirming why the optimizer selects the scan.
<code>{
"index": "name_score",
"rows": 25372,
"cost": 30447
}
{
"index": "create_time",
"rows": 50132,
"cost": 60159
}
{"chosen": true, "cost": 20406}</code>Conclusion
The optimizer’s cost model may not always match actual execution time, especially when statistics are inaccurate or when the estimated cost of using an index exceeds that of a full scan. Using
EXPLAINand
optimizer_tracehelps identify such mismatches and guide manual index hints or statistics updates.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.