Databases 15 min read

Why MySQL Uses B+ Trees for Indexing

The article explains the fundamentals of tree data structures, compares B, B+, B*, and red‑black trees, and shows how B+ trees reduce disk I/O and improve range query performance, which is why MySQL adopts B+ trees for its indexes.

Shepherd Advanced Notes
Shepherd Advanced Notes
Shepherd Advanced Notes
Why MySQL Uses B+ Trees for Indexing

Overview

From earlier discussions of database indexes we know that the underlying data structure is a B+ tree, so this article briefly analyses the characteristics of B+ trees and why MySQL chooses them for indexes.

What is a tree in data structures?

A tree consists of a root node and several sub‑trees. It is defined on a set of elements (nodes) with a parent‑child relationship that creates a hierarchy. One node is designated as the root.

In school we learn several tree variants such as binary trees, binary search trees (BST), and balanced binary trees (AVL).

Binary Tree

A binary tree is an ordered tree where each node has at most two children. It is defined recursively as an empty tree or a root node with a left and a right sub‑tree, each of which is itself a binary tree.

Binary tree illustration
Binary tree illustration

Binary Search Tree (BST)

A BST organizes nodes so that for any node, all keys in the left sub‑tree are less than or equal to the node’s key, and all keys in the right sub‑tree are greater than or equal to it. Each node stores a key, optional data, and pointers to left child, right child, and parent. The leftmost node holds the minimum key, the rightmost the maximum, and an in‑order traversal yields an increasing sequence.

BST illustration
BST illustration

Balanced Binary Tree (AVL)

An AVL tree is a self‑balancing binary search tree whose left and right sub‑trees differ in height by at most one. This property keeps insertion, lookup, and deletion time complexity at O(log N) in both best and worst cases, though frequent rotations add overhead compared with an unbalanced BST.

AVL tree illustration
AVL tree illustration

These binary tree variants serve as background for the main topic: the B‑tree family (B‑tree, B+‑tree, B*‑tree, and red‑black tree).

B‑Tree Family

B‑tree, B+‑tree, and B*‑tree are multi‑way self‑balancing search trees that, unlike binary trees, allow each node to have many children. They are designed for external storage such as disks.

B‑Tree

A B‑tree (balance tree) is a multi‑way balanced search tree. Its key characteristics are:

All keys and associated data are stored throughout the tree.

Each key appears in exactly one node.

Search may finish at a non‑leaf node (best case O(1)).

Searching the entire key space approximates binary search performance.

B+‑Tree

A B+‑tree is a variant of the B‑tree. Differences include:

Only leaf nodes store the actual data; internal nodes store only keys.

All leaf nodes are linked together, enabling efficient sequential and range queries. Because leaf nodes are stored consecutively on disk, pre‑reading can fetch adjacent data, reducing disk I/O.

Since internal nodes do not store data, they can hold more keys, making the tree shorter and wider. Larger node fan‑out means fewer disk pages need to be read per lookup, further cutting I/O operations.

B*‑Tree

A B*‑tree is a B+‑tree variant that adds sibling pointers to non‑root, non‑leaf nodes. Its split algorithm moves a portion of data to a sibling when possible, otherwise creates a new node and redistributes one‑third of the data to each, resulting in lower split frequency and higher space utilization compared with B+‑trees.

Red‑Black Tree

A red‑black tree is a binary search tree with an extra color bit per node (red or black). The coloring rules guarantee that no path from root to leaf is more than twice as long as any other, providing near‑balance with fewer rotations than AVL trees. It is suitable for workloads with many insertions and deletions.

Each node is either red or black.

The root is black.

All leaves (null nodes) are black.

Red nodes have black children.

Every path from a node to its descendant leaves contains the same number of black nodes.

Why MySQL Chooses B+ Trees for Indexes

Indexes are large and cannot reside entirely in memory, so they are stored on disk as index files. During lookup, disk I/O dominates the cost, so the primary metric for an index structure is the number of disk I/O operations required.

Disk storage works with a minimum unit called a sector (512 bytes). File systems allocate blocks (e.g., 4 KB), and InnoDB uses pages of 16 KB as its smallest storage unit. A page can hold multiple rows; for a 1 KB row, a 16 KB page holds 16 rows.

Disk I/O, locality principle, and pre‑reading

Because disk access is orders of magnitude slower than memory, systems employ pre‑reading: when a request arrives, the disk reads a whole page (or multiple pages) sequentially, based on the locality principle that data near the requested item will soon be needed. This reduces the number of I/O operations.

Combining the B+‑tree’s properties—data stored only in leaf pages, linked leaf nodes for sequential access, and higher fan‑out—means that lookups require fewer page reads, and range queries benefit from the leaf‑node chain, resulting in lower overall I/O and faster query performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MySQLData StructuresB+Treedisk I/Orange querydatabase index
Shepherd Advanced Notes
Written by

Shepherd Advanced Notes

Dedicated to sharing advanced Java technical insights, daily work snippets, and the power of persistent effort.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.