Databases 19 min read

Master MySQL Indexes: Why B+Tree Outperforms Other Structures

This article explains how proper index creation boosts MySQL query performance, detailing the mechanics of indexes, why B+Tree is chosen over binary and balanced trees, the storage differences between MyISAM and InnoDB, and practical guidelines for designing effective single‑column, composite, and covering indexes.

Code Ape Tech Column

Dec 23, 2020

Master MySQL Indexes: Why B+Tree Outperforms Other Structures

Table Definition Used in Examples

create table user (
    id   bigint not null comment 'id' primary key,
    name varchar(200) null comment 'name',
    age  bigint null comment 'age',
    gender int null comment 'gender',
    key (name)
);

What Is an Index and How It Works

An index is a separate data structure that stores the values of one or more columns together with pointers to the corresponding rows. Without an index, a query such as select * from user where id = 40 requires a full‑table scan. With an index, MySQL can perform a binary‑search‑like lookup on the index pages, locate the row pointer, and fetch the row directly.

Why MySQL Uses B+Tree for Indexes

Binary search trees provide O(log₂ n) lookup, but a plain binary tree can degenerate into a linear chain when rows are inserted in order, causing performance similar to a full scan. Balanced binary trees (e.g., AVL, Red‑Black) avoid the chain problem but require rotation operations and store only a few keys per node, which wastes the 16 KB page size MySQL uses for each node, leading to many I/O operations.

Multi‑way balanced trees (B‑Tree and its variant B+Tree) store dozens or hundreds of keys in a single page, dramatically reducing tree height. MySQL adopts B+Tree because:

Internal nodes contain only keys, while leaf nodes store the actual row data (or primary‑key pointers), maximizing the number of keys per page.

Leaf nodes are linked sequentially, enabling efficient range scans.

Every lookup traverses the same number of levels, giving stable I/O cost.

MySQL B+Tree Implementation Details

MySQL stores indexes differently in the MyISAM and InnoDB storage engines.

MyISAM

.frm – table definition file.

.MYD – data file containing all rows.

.MYI – index file.

To locate a row, MyISAM reads the .MYI file to obtain the disk address of the row, then fetches the row from .MYD.

InnoDB

.frm – table definition file.

.ibd – combined data and index file; the primary‑key index is clustered, meaning rows are stored in the leaf nodes of the primary‑key B+Tree.

Secondary (non‑primary) indexes store the primary‑key value in their leaf nodes. A query that uses a secondary index therefore performs two lookups: first the secondary index to obtain the primary‑key, then the clustered primary‑key index to fetch the full row.

Principles for Creating Effective Indexes

Column Selectivity (Discreteness)

Calculate count(distinct column) / count(*). Values close to 1 indicate high selectivity and are good candidates for indexing because they narrow the result set to few rows.

Left‑Most Prefix Rule

For a composite index (col1, col2, col3), MySQL can use the index only if the query predicates start with the left‑most column(s). Skipping a left‑most column makes the index unusable.

Minimal Space Principle

Smaller key sizes allow more keys per B+Tree page, reducing the number of I/O operations. Prefer narrow data types (e.g., INT instead of BIGINT) and avoid indexing very long VARCHAR columns when possible.

Composite (Multi‑Column) Indexes

A composite index stores a concatenated key of several columns. The left‑most prefix rule still applies.

Choosing Columns for a Composite Index

Columns that appear frequently in WHERE clauses (especially as the left‑most column).

Columns with high selectivity.

Columns with small storage width.

Covering Indexes

If a query can be satisfied using only the columns stored in the index, the index is a covering index and the engine never needs to read the data pages.

Example: select id from users where name = ?; Because the name index contains both name and the primary‑key id, MySQL can return the result directly from the index without accessing the primary‑key leaf pages.

If the query also requests age ( select id, age from users where name = ?), the covering index cannot be used; MySQL must perform the secondary‑index lookup followed by a primary‑key lookup to fetch age.

Key Takeaways

Keep indexed column length as short as possible while meeting business requirements.

Avoid excessive or redundant indexes; they waste disk space and slow INSERT/UPDATE/DELETE operations.

LIKE patterns that start with a wildcard (e.g., %a) cannot use indexes; prefix patterns (e.g., a%) may be usable if the column has sufficient selectivity. IN can use indexes; NOT IN cannot.

Prefer explicit column lists over SELECT * to enable covering indexes and reduce network traffic.

Applying functions to indexed columns (e.g., WHERE YEAR(date) = 2023) disables index usage because the function must be evaluated for every row.

Composite indexes work only when the query filters on the left‑most column(s). A range condition on a column prevents the use of subsequent columns in the index.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Indexing MySQL database optimization B+Tree

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.