Databases 12 min read

Master MySQL Indexes: From Basics to B+Tree Optimization

This article explains what MySQL indexes are, how they work, their types—including primary, ordinary, composite, and full‑text indexes—covers B‑Tree and B+Tree structures, page organization, clustering versus non‑clustering indexes, and practical considerations for index design and performance.

Architect's Must-Have
Architect's Must-Have
Architect's Must-Have
Master MySQL Indexes: From Basics to B+Tree Optimization

1. What Is an Index

Official definition: a data structure that improves MySQL query efficiency by allowing fast retrieval of rows. It is analogous to a book's table of contents.

1.1 How Indexes Work

For a query SELECT * FROM user WHERE id = 40 without an index, MySQL must perform a full‑table scan to find the matching row.

With an index, MySQL can locate the row quickly by performing a binary search on the index and then fetching the row using the pointer.

Index advantages:
1. Greatly speeds up data queries.
Index disadvantages:
1. Maintaining indexes consumes database resources.
2. Indexes occupy disk space.
3. Insert, update, delete operations are slower because indexes must be maintained.

Despite the drawbacks, the speed gain for queries—especially on large tables—makes indexes essential.

2. Types of Indexes

1. Primary key index – automatically created for a primary key; InnoDB uses it as a clustered index.
2. Ordinary index – built on regular columns without restrictions, used to accelerate queries.
3. Composite index – built on multiple columns; none of the columns may be NULL.
4. Full‑text index (MySQL 5.7 and earlier, provided by MyISAM) – built on large text columns to search for keywords.

3. B+Tree

Example SQL execution:

The result is automatically sorted, which may seem surprising.

MySQL stores rows in pages linked by pointers, similar to a linked list.

Each inserted row is placed in order so that MySQL can quickly locate it via a pointer (P), forming a linked‑list structure.

MySQL uses a page directory where each leaf page’s first row id serves as an entry; default page size is 16 KB.

To find a row with id = 2, the page directory points to the first page (ids 1‑4). The engine reads that page once, locates id 2 via the P pointer, and returns the row with a single I/O operation.

Estimating storage: with an 8‑byte id, 20‑byte name, and an 8‑byte pointer, each row is ~36 bytes. A 16 KB page holds about 455 rows; the page directory can reference up to ~1.9 billion rows.

3. Comparing B‑Tree

In a B‑Tree, every node stores data, which limits how many rows fit in a 16 KB page and increases tree depth, leading to more I/O. B+Tree stores data only in leaf nodes, reducing depth (usually ≤ 3) and I/O.

InnoDB implements its index structure with B+Tree.

Key Differences Between B+Tree and B‑Tree

1. B+Tree leaf nodes store only key values.
2. Every B+Tree leaf node has a pointer to the next leaf.
3. B+Tree stores all data in leaf nodes; B‑Tree stores data in both internal and leaf nodes.
InnoDB page size is 16 KB; primary keys are typically INT (4 bytes) or BIGINT (8 bytes).

MySQL recommends auto‑increment IDs to avoid page‑split issues.

Inserting records in primary‑key order prevents page splits. Inserting a key that falls between existing keys forces a split, creating a new page and updating linked‑list pointers, which hurts insert performance.

3. Clustered and Non‑Clustered Indexes

Clustered index: data and index are stored together; leaf nodes contain the full row.

Non‑clustered (auxiliary) index: index leaf nodes store only the primary‑key value, requiring a second lookup to fetch the row.

In daily work, the indexes we add are usually auxiliary indexes that first locate the primary key and then retrieve the row (a “covering” query).

Clustered indexes are not a separate type but a storage method; each InnoDB table can have only one clustered index.

Advantages of clustered indexes:

Faster data access because index and data share the same B+Tree.

Efficient range and ordered queries on the primary key.

Disadvantages:

Insert speed depends on insertion order; sequential primary‑key inserts are fastest.

Updating the primary key is costly because rows must be moved.

Secondary index lookups require two steps: find the primary key, then fetch the row.

Auxiliary Index (Non‑Clustered Index)

Auxiliary indexes built on top of the clustered index always need a second lookup; their leaf nodes store the primary‑key value rather than the physical row location.

In short, the indexes we usually define are auxiliary indexes: a query first uses the auxiliary index to find the primary key, then uses the primary key to locate the actual row.

—Feel free to add any missing details.

PerformanceIndexingdatabaseMySQLB+Tree
Architect's Must-Have
Written by

Architect's Must-Have

Professional architects sharing high‑quality architecture insights. Covers high‑availability, high‑performance, high‑stability designs, big data, machine learning, Java, system, distributed and AI architectures, plus internet‑driven architectural adjustments and large‑scale practice. Open to idea‑driven, sharing architects for exchange and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.