Databases 14 min read

Understanding MySQL Indexes: From Binary Search Trees to B+ Trees and Their Use in InnoDB

This article explains the fundamentals of MySQL indexing, tracing the evolution from binary search trees and balanced AVL trees to B‑trees and B+‑trees, and details how InnoDB implements clustered and non‑clustered indexes, including search processes and performance considerations.

Architecture Digest

Nov 6, 2019

Understanding MySQL Indexes: From Binary Search Trees to B+ Trees and Their Use in InnoDB

Introduction

Indexes are data structures that help locate rows quickly in large tables. In MySQL, three main index types exist: B+‑tree, hash, and full‑text. This article focuses on the B+‑tree index used by the InnoDB storage engine.

Binary Search Tree

A binary search tree (BST) stores a key and its row data in each node, with left children holding smaller keys and right children larger keys. Searching for a key (e.g., id=12) requires at most three node visits, compared with six linear scans.

Balanced Binary Tree (AVL Tree)

When a BST becomes unbalanced, its height grows and search performance degrades. An AVL tree maintains a height difference of at most one between left and right sub‑trees, ensuring more stable and faster lookups.

B‑Tree

Because disk I/O is orders of magnitude slower than memory access, a tree that stores many keys per node reduces the number of disk reads. A B‑tree node can hold multiple keys and pointers, resulting in a shallow, wide tree that minimizes I/O.

B+‑Tree

B+‑trees improve on B‑trees by storing only keys in internal nodes (no row data) and keeping all row data in the leaf nodes, which are linked together. This design allows range queries, ordered scans, and grouping operations to be performed efficiently.

Clustered vs. Non‑Clustered Indexes

In InnoDB, the primary key forms a clustered index: the table’s rows are stored in the leaf nodes of a B+‑tree keyed by the primary key. A non‑clustered index uses a secondary column as the key; its leaf nodes store the secondary key and the primary key, requiring an additional lookup (the “row lookup” or “back‑track”) to retrieve the full row.

Searching with a Clustered Index

To find rows where id ≥ 18 and id < 40, the engine starts at the root (often cached in memory), follows pointers to the appropriate leaf pages, and then scans sequentially through the linked leaf nodes until the upper bound is reached, retrieving all matching rows in order.

Searching with a Non‑Clustered Index

A non‑clustered index on a column such as luckyNum stores pairs like (luckyNum, primaryKey). The engine first locates the matching leaf entry, obtains the primary key, and then uses the clustered index to fetch the full row.

Conclusion

The article walks through the evolution from binary search trees to B+‑trees, explains why MySQL adopts B+‑trees for indexing, and demonstrates how InnoDB stores and retrieves data using clustered and non‑clustered indexes. Remember: "Data is index, index is data."

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

InnoDB MySQL index B+Tree Clustered Index Non-Clustered Index

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.