Why MySQL Chooses Full Table Scan Over Index: Uncovering Cost Constant Secrets
This article examines a puzzling MySQL case where the optimizer selects a full‑table scan despite suitable indexes, explains the role of cost constants, execution‑plan analysis, and demonstrates how to force the optimizer to use the optimal index for faster queries.
Preface: A Confusing Choice
Even when a suitable index exists, explain may show MySQL opting for a full‑table scan because the optimizer, acting as an invisible conductor, makes decisions based on a series of cost constants.
We will dive deep into MySQL cost constants and reveal the hidden mechanics behind query optimization.
1 A Puzzling SQL Phenomenon
1.1 Table Structure
CREATE TABLE `mapping_filter_record` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`source_type` int(11) NOT NULL COMMENT '来源类型',
`source_id` varchar(64) NOT NULL COMMENT '来源方id',
-- ... other columns omitted
PRIMARY KEY (`id`),
KEY `idx_source_type` (`source_type`,`update_time`) USING BTREE,
KEY `idx_source_id` (`source_id`,`source_type`,`state`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=290240042300201321 DEFAULT CHARSET=utf8mb4 COMMENT='商品发布拦截记录表';1.2 Long‑Running SQL ( >10 s )
select *
from dbzz_ypofflinemart.mapping_filter_record
WHERE (source_type = 9401003 and source_id = '1814613774586351713')
order by id asc
LIMIT 1;1.3 Analyze the Execution Plan
Table data must meet certain conditions for the following situation to occur.
explain
select *
from dbzz_ypofflinemart.mapping_filter_record
WHERE (source_type = 9401003 and source_id = '1814613774586351713')
order by id asc
LIMIT 1;Execution plan result:
The confusing part is that the optimizer uses the PRIMARY (clustered) index instead of the expected idx_source_id secondary index.
1.4 Advanced Use of EXPLAIN
explaincan output four formats: traditional, JSON, tree, and visual.
The traditional format only shows the result, while the JSON format provides the most detailed information, including execution cost.
Adding format=json lets us analyze the cost details.
1.5 Analyze Execution Plans
Comparing the two SQL statements shows identical query_cost (3865.20) but different sorting behavior. The optimizer prefers PRIMARY because it can read rows in index order without an extra filesort.
Command | query_cost | using_filesort
--------|------------|----------------
Command1| 3865.20 | false
Command2| 3865.20 | trueThus the optimizer selects the clustered index.
2 Query SQL Execution Flow
2.1 Query Optimizer
The optimizer workflow can be simplified into four steps:
Parse the SQL and understand the query intent.
Generate multiple possible execution plans.
Calculate the cost of each plan based on cost constants.
Select the plan with the lowest estimated cost.
2.2 Execution Cost
SQL total cost = CPU cost + I/O cost.
CPU cost : time spent reading rows, evaluating predicates, sorting results, etc.
I/O cost : time required to load data or index pages from disk into memory.
2.3 MySQL 5.7 Default Cost Constants
Cost constants are fixed values that the optimizer uses to estimate resource consumption for different operations.
Server‑layer cost constants:
Storage‑engine layer cost constants:
3 Execution Cost Analysis
3.1 Table Statistics
Show table status to obtain statistics used for cost calculation:
show table status like 'mapping_filter_record'; Rows Avg_row_length Data_length Index_length Data_free
1615460 9396 15180234752 552239104 41943043.2 Command 2 (Specified Index) Cost Analysis
The query uses the non‑clustered index idx_source_id to locate the primary key, then performs a primary‑key lookup.
Cost calculation:
Non‑clustered index CPU cost = 1 × 0.2 (row_evaluate_cost)
Non‑clustered index I/O cost = 1 × 1 (io_block_read_cost)
Table‑lookup CPU cost = 3221 × 0.2
Table‑lookup I/O cost = 3221 × 1
Total cost ≈ 3865.2, matching the plan’s query_cost of 3865.20.
3.3 Command 1 (No Index) Cost Analysis
Using the primary key (clustered) index leads to a full‑table scan whose estimated cost appears the same, but actual execution is far slower.
Command | Index Used | Scanned Rows | Execution Time
--------|------------|--------------|----------------
select * ... limit 1 | PRIMARY | 501 | 19.4 s
select * ... limit 2 | PRIMARY | 1003 | 20.2 s
select * ... limit 6 | PRIMARY | 3009 | 20.24 s
select * ... limit 7 | idx_source_id | 3221 | 0.026 sThe optimizer assumes uniform data distribution, estimating that scanning 501 rows will find the first matching row (1,615,460 total rows ÷ 3,221 matching rows ≈ 501).
MySQL assumes uniform distribution, so it estimates 501 rows to locate one match, making a full‑table scan appear cheaper.
When LIMIT is present, the optimizer may prefer a full scan if the estimated scanned rows are fewer than those required by the secondary index.
These observations highlight that the optimizer bases decisions on cost calculations rather than intuition.
4 Optimization
Although the optimizer chose the primary index under the uniform‑distribution assumption, the actual data resides near the end of the table, causing millions of rows to be scanned and high load.
One effective remedy is to rewrite the query as a subquery, forcing the optimizer to use the efficient secondary index.
SELECT *
FROM mapping_filter_record
WHERE id = (
SELECT id
FROM mapping_filter_record
WHERE source_type = 9401003 AND source_id = '1814613774586351713'
ORDER BY id ASC
LIMIT 1
);5 Summary
MySQL optimizer makes decisions based on cost calculations, not intuition.
Cost constants are the core metric for evaluating execution plans.
Accurate statistics directly influence the optimizer’s choice.
Understanding the cost model is essential for SQL performance tuning.
By grasping how the optimizer works, we can design better indexes and write queries that achieve superior database performance.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
