Databases 11 min read

Master MySQL Indexes: From Basics to B+Tree Optimization

This article explains MySQL indexes—how they speed up queries, their types, the inner workings of B‑Tree and B+Tree structures, page storage mechanics, and the trade‑offs between clustered and secondary indexes, providing practical insights for database optimization.

Architect's Must-Have
Architect's Must-Have
Architect's Must-Have
Master MySQL Indexes: From Basics to B+Tree Optimization

1. What Is an Index

Official definition: a data structure that improves MySQL query efficiency by allowing fast retrieval of rows, similar to a book's table of contents.

1.1 How an Index Works

For a query SELECT * FROM user WHERE id = 40, without an index MySQL must scan the entire table; with an index it performs a binary search on the index to locate the row quickly.

Index advantages:
1. Greatly speeds up data queries.
Index disadvantages:
1. Consumes database resources for maintenance.
2. Takes up disk space.
3. Slows down INSERT/UPDATE/DELETE because the index must be updated.

Despite the drawbacks, the performance gain in large datasets makes indexes essential.

2. Types of Indexes

1. Primary key index – automatically created for the primary key; InnoDB uses a clustered index.
2. Normal index – built on ordinary columns, no restrictions, used to accelerate queries.
3. Composite index – built on multiple columns; none of the indexed columns may be NULL.
4. Full‑text index (MySQL 5.7 and earlier, provided by MyISAM) – built on large text columns for keyword search.

3. B+Tree

MySQL stores data in pages (default 16 KB). Each page has a page directory pointing to the first row (leaf node) of that page.

When searching for id = 2, the directory shows that 2 lies between 1 and 4, so MySQL jumps directly to the first page and follows the pointer chain to locate the row, requiring only one I/O operation.

Storage example: a row occupies roughly 36 bytes (8 bytes id + 20 bytes name + 8 bytes pointer). One 16 KB page can hold about 455 rows; the page directory can hold 2 048 ids, allowing the tree to index millions of rows efficiently.

4. B‑Tree vs B+Tree

B‑Tree stores data in every node, leading to deeper trees and more I/O. B+Tree stores data only in leaf nodes, keeping the tree shallow (usually ≤ 3 levels in enterprise environments), which reduces I/O.

B+Tree is the external‑storage‑optimized variant used by InnoDB for its index structures.

Differences Between B+Tree and B‑Tree

1. B+Tree non‑leaf nodes store only key values.
2. Every B+Tree leaf node has a pointer to the next leaf.
3. Data resides exclusively in leaf nodes; B‑Tree stores data in all nodes.
4. InnoDB page size is 16 KB; primary keys are typically INT (4 bytes) or BIGINT (8 bytes).

5. Clustered vs Non‑Clustered Indexes

Clustered index : data and index are stored together; leaf nodes contain the full row. Only one per table, built on the primary key.

Non‑clustered (secondary) index : index stores only the primary key value; retrieving a row requires a second lookup using the primary key.

Advantages of clustered indexes:

Faster data access because rows are stored in the same B+Tree.

Efficient range and ordered queries.

Disadvantages:

Insert speed depends on primary‑key order; out‑of‑order inserts cause page splits.

Updating primary keys is costly.

Secondary index lookups need two reads (index → primary key → row).

Secondary (Non‑Clustered) Indexes

Created on top of the clustered index, they always require a second lookup: first find the primary key in the secondary index, then locate the row via the primary key.

In practice, the indexes we usually define are secondary indexes; they first locate the primary key and then fetch the actual row (known as a “covering” or “back‑table” query).

-- The article may not cover every detail; contributions are welcome.

InnoDBMySQLdatabase optimizationindexB+Tree
Architect's Must-Have
Written by

Architect's Must-Have

Professional architects sharing high‑quality architecture insights. Covers high‑availability, high‑performance, high‑stability designs, big data, machine learning, Java, system, distributed and AI architectures, plus internet‑driven architectural adjustments and large‑scale practice. Open to idea‑driven, sharing architects for exchange and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.