Databases 15 min read

Why MySQL Chose the Wrong Index and How to Fix It

A production MySQL query that should have returned instantly took 44 seconds because the optimizer selected the primary key index instead of a suitable composite index, and the article explains the root cause, the optimizer's decision process, and multiple practical solutions to prevent such slow‑query incidents.

dbaplus Community
dbaplus Community
dbaplus Community
Why MySQL Chose the Wrong Index and How to Fix It

Fault Description

On 2023‑07‑24 at 11:00 a production MySQL instance generated a massive number of slow‑query alerts. The offending statement (simplified) was:

SELECT * FROM sample_table WHERE city_id = 565 AND type = 13 ORDER BY id DESC LIMIT 0,1;

Each execution took about 44 seconds even though the query looks trivial. The table contains roughly 83 million rows.

Root‑Cause Investigation

Existing indexes on sample_table: idx_1 (`city_id`,`type`,`rank`) idx_log_dt_city_id_rank (`log_dt`,`city_id`,`rank`) idx_city_id_type (`city_id`,`type`)

Both idx_1 and idx_city_id_type can satisfy the city_id and type predicates. Running EXPLAIN on the query produced the plan shown below (primary‑key index was chosen):

The optimizer selected the primary‑key index because the query uses ORDER BY id DESC and LIMIT 1. It estimated that scanning the primary key (already ordered) would be cheaper than using a composite index followed by a sort. In reality the predicate returned no rows, so MySQL traversed the entire primary‑key range (≈80 million rows) before concluding the result set was empty, leading to the 44‑second execution time.

MySQL Index‑Selection Principles

The optimizer evaluates several factors when picking an index:

Estimated rows to scan ( rows)

Whether a temporary table is needed

Whether an additional sort is required

Row estimates come from statistics gathered by sampling data pages. In InnoDB the sampling parameters N and M depend on innodb_stats_persistent (default N=20, M=10 when ON; N=8, M=16 when OFF). Because sampling is approximate, the rows estimate can be far off (e.g., estimated 1 833 rows vs. actual 80 million).

Impact of LIMIT

Increasing the LIMIT value raises the estimated rows. For example: LIMIT 0,10 → estimated rows ≈ 18 211 LIMIT 0,100 → optimizer switches to the composite index, accepting the extra sorting cost.

This behaviour demonstrates that LIMIT directly influences the optimizer’s cost model.

Solutions

Force the desired index Use FORCE INDEX to make the query use the composite index:

SELECT * FROM sample_table FORCE INDEX(idx_city_id_type)
WHERE city_id = 565 AND type = 13
ORDER BY id DESC LIMIT 0,1;

This reduces execution time to milliseconds but couples the SQL to a specific index name.

Increase LIMIT Using a larger limit (e.g., LIMIT 0,1000 ) forces the optimizer to choose the composite index because the estimated row count grows:

SELECT * FROM sample_table
WHERE city_id = 565 AND type = 13
ORDER BY id DESC LIMIT 0,1000;

While effective, it may cause unnecessary work if the result set is small.

Add a covering composite index that includes the ORDER BY column Create an index on (city_id, type, id) so the optimizer can satisfy both the filter and the ordering without a separate sort:

ALTER TABLE sample_table
ADD INDEX idx_city_type_id (city_id, type, id);

Be aware that building such an index on an 80 million‑row table can be time‑consuming and will increase storage overhead.

Rewrite as a subquery First fetch candidate rows using the composite index, then apply LIMIT 1 on the outer query:

SELECT * FROM (
  SELECT * FROM sample_table
  WHERE city_id = 565 AND type = 13
  ORDER BY id DESC
  LIMIT 1000
) AS sub
LIMIT 1;

This approach avoids the optimizer’s primary‑key bias but may be less maintainable and is sometimes discouraged by DBAs.

Each method has trade‑offs: forced indexes are brittle; larger limits may cause extra I/O; additional indexes increase maintenance cost; subqueries add complexity.

Practical Recommendations

Inspect EXPLAIN output ( select_type, key, rows, Extra) to understand the optimizer’s reasoning.

Keep statistics up‑to‑date. Run ANALYZE TABLE sample_table; after significant data changes to refresh index cardinality.

Design indexes that cover both filter predicates and ORDER BY columns whenever possible.

Use FORCE INDEX or adjust LIMIT as quick mitigations, but prefer schema changes for long‑term stability.

Summary

The incident shows that MySQL’s optimizer makes decisions based on estimated row counts, index availability, and ordering requirements. When statistics are inaccurate—especially on very large tables—an apparently optimal plan (using the primary key) can become disastrous.

Key take‑aways:

Understand how EXPLAIN fields reflect the optimizer’s cost model.

Refresh statistics with ANALYZE TABLE and be aware of sampling limitations (parameters N and M).

Prefer composite indexes that include ORDER BY columns to avoid extra sorting.

Use FORCE INDEX or query rewrites for immediate relief, but plan schema improvements for sustainable performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlDatabase Optimizationexplainslow-queryFORCE INDEXIndex Selection
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.