Databases 16 min read

Mastering MySQL B-Tree Indexes: How They Work and How to Optimize Queries

This article explains the inner workings of MySQL B‑Tree indexes, how the optimizer chooses the cheapest execution plan, demonstrates practical examples of index matching, cost calculation, and EXPLAIN analysis, and provides step‑by‑step troubleshooting techniques for real‑world performance issues.

ITPUB
ITPUB
ITPUB
Mastering MySQL B-Tree Indexes: How They Work and How to Optimize Queries

Background

MySQL is widely used for data persistence, and the InnoDB engine with B‑Tree indexes is the most common. Query performance often hinges on proper index usage, especially for SELECT statements.

How Indexes Work

During query planning, MySQL estimates the cost of using each available index versus a full table scan. It selects the plan with the lowest estimated cost. The simplified pseudo‑code for this decision is:

min_cost = INIT_VALUE
min_cost_index = NONE
for index in all_indexes:
    if index matches WHERE_CLAUSE:
        cur_cost = COST(index)
        if cur_cost < min_cost:
            min_cost = cur_cost
            min_cost_index = index

INIT_VALUE is the cost when no index is used. all_indexes are the indexes on the table, and WHERE_CLAUSE is the SQL WHERE part.

Index Matching Rules

Left‑most prefix rule : columns are matched from left to right; if a column cannot be used, the remaining columns are ignored.

Supported operators : =, >, >=, <, <=, BETWEEN, ISNULL, or LIKE with a pattern that does not start with a wildcard.

AND groups : each AND group is evaluated as a separate expression.

No type conversion or functions : expressions that cast types or apply functions prevent index usage.

Index Cost

The cost of an index is primarily derived from the estimated number of rows to scan (the rows value). These statistics are stored in information_schema and refreshed by the storage engine. The EXPLAIN statement reveals the estimated rows and other cost metrics.

Index Instance Analysis

Example table t1 with a composite index idx_t1_bcd(b,c,d):

CREATE TABLE t1 (
  a INT PRIMARY KEY,
  b INT,
  c INT,
  d INT,
  e VARCHAR(20)
);
CREATE INDEX idx_t1_bcd ON t1(b, c, d);
INSERT INTO t1 VALUES (4,3,1,1,'d');
INSERT INTO t1 VALUES (1,1,1,1,'a');
INSERT INTO t1 VALUES (8,8,8,8,'h');
INSERT INTO t1 VALUES (2,2,2,2,'b');
INSERT INTO t1 VALUES (5,2,3,5,'e');
INSERT INTO t1 VALUES (3,3,2,2,'c');
INSERT INTO t1 VALUES (7,4,5,5,'g');
INSERT INTO t1 VALUES (6,6,4,4,'f');

The index stores columns b, c, and d in sorted order. A sample query:

SELECT * FROM t1 WHERE b >= 2 AND b < 8 AND c > 1 AND d != 4 AND e != 'a';

Here the WHERE clause uses columns b,c,d,e. The index can filter on b,c,d, while e becomes a table filter.

Three Filter Types

Index Key : determines the continuous range (first key & last key) used by the index.

Index Filter : additional conditions on indexed columns that are not part of the range.

Table Filter : conditions on columns not covered by the index.

Applying the rules to the example query yields:

Index First Key: (b >= 2, c > 1) Index Last Key: (b < 8) Index Filter: c > 1 AND d != 4 Table Filter:

e != 'a'

Diagnosing Index Issues with EXPLAIN

EXPLAIN provides fields such as type, possible_keys, key, key_len, and rows. Example:

SELECT id FROM r_ibeacon_biz_device_d
WHERE ftime >= 20151126 AND ftime <= 20151126 AND biz_id = 11602
LIMIT 50;

The output shows type = range, indicating a range scan on index IDX_BID_FTIME. The key_len value (e.g., 18 bytes) confirms that all indexed columns are used.

Real‑World Problem

A spike in a reporting API was traced to a query:

SELECT d.id FROM r_ibeacon_biz_page_d d
WHERE d.ftime >= 20151126 AND d.ftime <= 20151126
  AND d.biz_id = 11023 AND d.page_id = 778495
LIMIT 0,20;

The index IDX_BID_PID_FTIME (biz_id, page_id, ftime) existed, but EXPLAIN reported type = ref and a small key_len = 9, meaning only biz_id was used. The cause was a type mismatch: page_id is VARCHAR but the query compared it to an integer. Changing the condition to page_id = '11023' and altering the column to BIGINT made the optimizer use the full index (type = range, key_len = 621) and eliminated the spike.

Common Questions

When does a range scan use an index? See the earlier explanation of index cost and key selection.

Does ORDER BY use an index? Only if the index already provides rows in the required order; otherwise a separate sort is needed.

Why is ORDER BY slow? Use an index that covers both the WHERE clause and the ORDER BY columns to avoid sorting.

Why is LIMIT with large offset slow? Prefer filtering more rows with additional WHERE conditions instead of large offsets.

Why does a query not use an index or remain slow? Check for type mismatches (int vs. varchar), differing character sets, or functions that prevent index usage.

Images illustrating index structures and EXPLAIN output:

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlSQL OptimizationexplainDatabase PerformanceB-Tree IndexIndex Cost
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.