Databases 13 min read

Master MySQL Indexes: From Basics to B+Tree and Clustered vs. Non‑Clustered

This article explains MySQL indexes, their working principle, advantages and drawbacks, different index types, the B+Tree structure, comparisons with B‑Tree, and the differences between clustered and non‑clustered (auxiliary) indexes, providing clear examples and practical insights.

Architect's Guide
Architect's Guide
Architect's Guide
Master MySQL Indexes: From Basics to B+Tree and Clustered vs. Non‑Clustered

1. What Is an Index

Official definition: an index is a data structure that improves MySQL query efficiency by allowing fast row retrieval, similar to a book's table of contents.

1.1 How Indexes Work

If a query like SELECT * FROM user WHERE id = 40 runs without an index, MySQL must perform a full‑table scan to locate the row. With an index, MySQL can binary‑search the index for id = 40 and then fetch the corresponding row directly.

Index advantages:
1. Greatly speeds up data queries.

Index disadvantages:
1. Maintaining indexes consumes database resources.
2. Indexes occupy disk space.
3. Insert, update, delete operations are slower because indexes must be maintained.

Despite the drawbacks, the performance gain in query speed—especially on large tables—makes indexes indispensable.

2. Types of Indexes

1. Primary key index – automatically created for the primary key; InnoDB uses it as a clustered index.
2. Normal index – built on ordinary columns without restrictions, used to accelerate queries.
3. Composite index – built on multiple columns; none of the indexed columns may contain NULL.
4. Full‑text index (MySQL 5.7 and earlier, provided by MyISAM) – built on large text columns to search for keywords.

3. B+Tree

MySQL stores rows in pages (default 16 KB). Each page contains a pointer (P) to the next row, forming a linked‑list structure similar to a singly linked list.

When inserting data, MySQL automatically orders rows within a page so that look‑ups can use the P pointer to traverse sequentially. This reduces I/O compared to a full‑table scan.

MySQL also maintains a page directory where the first row ID of each leaf page is stored, enabling direct navigation to the relevant page.

Example: To find the row with id = 2, the page directory shows that id 2 lies between 1 and 4, so MySQL reads the first page, locates the first row (id 1), then follows the P pointer to reach id 2 with a single I/O operation.

Estimating storage: with an 8‑byte id, 20‑byte name, and an 8‑byte pointer, each row uses ~36 bytes. A 16 KB page can hold about 455 rows, and the page directory can reference up to ~2 048 pages, allowing roughly 1.9 billion rows.

3. B‑Tree vs. B+Tree Comparison

In a B‑Tree, every node stores both keys and data, which can increase tree depth because each page’s limited size (16 KB) must hold data.
In a B+Tree, only leaf nodes store data; internal nodes store keys only, reducing depth (typically ≤ 3 levels) and I/O.

InnoDB uses B+Tree for its index structures.

Summary of B+Tree vs. B‑Tree Differences

1. B+Tree internal nodes store only key values.
2. Every leaf node in a B+Tree has a pointer to the next leaf (facilitating range scans).
3. Data is stored exclusively in leaf nodes; B‑Tree nodes store data throughout the tree.
4. InnoDB page size is 16 KB; primary keys are typically INT (4 bytes) or BIGINT (8 bytes).

MySQL recommends auto‑incrementing primary keys to avoid page splits and maintain insertion performance.

When inserting a key that falls between existing keys, InnoDB must split the page and adjust the doubly‑linked list of pages, which is costly.
Inserting a key larger than existing keys simply creates a new page at the end, avoiding page splits.

3. Clustered and Non‑Clustered Indexes

Clustered index : data and index are stored together; leaf nodes contain the full row data.

Non‑clustered (auxiliary) index : index stores pointers to the primary key; leaf nodes contain only the primary key values.

In daily work, most user‑defined indexes are auxiliary (secondary) indexes that first locate the primary key and then fetch the row (a “back‑lookup”).

Advantages of clustered indexes:

Faster data access because the index and data share the same B+Tree.

Excellent for primary‑key ordered and range queries.

Disadvantages:

Insert speed depends on insertion order; sequential primary‑key inserts are fastest, otherwise page splits degrade performance.

Updating a primary key is costly because rows must be moved.

Secondary index lookups require two steps: first find the primary key, then fetch the row.

Auxiliary Index (Non‑Clustered Index)

Created on top of a clustered index, an auxiliary index stores the primary‑key value in its leaf nodes. Accessing data via an auxiliary index always involves a second lookup using that primary key.

In short, the indexes we normally define are auxiliary indexes; a query using a normal index first finds the primary key, then uses that primary key to locate the actual row.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlindexB+TreeClustered Index
Architect's Guide
Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.