Databases 13 min read

Why MySQL Chooses B+ Trees: From BSTs to Efficient Disk Indexes

This article explains how MySQL's indexing evolved from simple binary search trees through AVL and red‑black trees to B‑trees and finally B+ trees, highlighting the performance and I/O advantages that make B+ trees the preferred index structure for disk‑based databases.

Raymond Ops
Raymond Ops
Raymond Ops
Why MySQL Chooses B+ Trees: From BSTs to Efficient Disk Indexes

Preface

In MySQL, both InnoDB and MyISAM use B+ trees as the index structure (excluding hash indexes). This article starts with the simplest binary search tree, then progressively explains the problems each tree solves and the new challenges they introduce, ultimately showing why MySQL selects B+ trees for indexing.

Table of Contents

1. Binary Search Tree (BST): Unbalanced

2. Balanced Binary Tree (AVL): Rotation Overhead

3. Red‑Black Tree: Too Tall

4. B‑Tree: Designed for Disk

5. B+ Tree

6. Feeling the Power of B+ Trees

7. Summary

1. Binary Search Tree (BST): Unbalanced

Binary Search Tree (BST), also called binary sorted tree, requires that for any node all values in the left subtree are not greater than the node value, and all values in the right subtree are not smaller. When fast lookup is needed, BST gives an average time complexity of O(log n), but it can become unbalanced and degenerate into a linked list with O(n) complexity.

To address this, balanced binary trees are introduced.

2. Balanced Binary Tree (AVL): Rotation Overhead

AVL tree is a strictly balanced binary tree where the height difference between left and right subtrees of any node cannot exceed 1. Lookup, insertion, and deletion are O(log n) in both average and worst cases. Maintaining balance requires rotations; insertion needs at most one rotation, but deletion may require O(log n) rotations along the path to the root, making AVL less efficient for delete‑heavy workloads.

3. Red‑Black Tree: Too Tall

Red‑Black tree relaxes strict balance, ensuring the longest path from root to leaf is at most twice the shortest. Each node is colored red or black and must satisfy specific color rules. This reduces the number of rotations needed for deletions to O(1) while keeping search complexity reasonable.

Red‑Black trees are widely used (e.g., Java TreeMap, Java 8 HashMap) because deletions are fast. However, for disk‑based data such as MySQL, their height is still too large, leading to many I/O operations.

4. B‑Tree: Designed for Disk

B‑Tree (not a minus sign) is a multi‑way balanced search tree designed for secondary storage. Each non‑leaf node can have many children, so the tree height is much smaller than AVL or Red‑Black trees, greatly reducing disk I/O.

The key concept is the order m. An m‑order B‑Tree must satisfy:

Each node has at most m children.

If the root is not a leaf, it has at least 2 children; other non‑leaf nodes have at least ⌈m/2⌉ children.

A non‑leaf node with k children stores k‑1 records.

All leaf nodes reside on the same level.

B‑Tree also exploits locality of reference, improving cache hit rates.

5. B+ Tree

B+ Tree is also a multi‑way balanced search tree, differing from B‑Tree in that only leaf nodes store actual data; internal nodes store only keys.

Only leaf nodes store real data (rows or primary keys); internal nodes store keys.

Keys may be duplicated in leaves and internal nodes.

Leaf nodes are linked by a doubly linked list.

Internal nodes contain the same number of records as children.

Advantages:

Fewer I/O operations: Internal nodes store only keys, allowing more records per node, reducing tree height and I/O.

Better range queries: Leaves are linked, so a range scan is a simple sequential walk.

Stable query performance: All data reside in leaves, so query cost equals tree height.

The main drawback is increased space due to key duplication, but the performance gains outweigh this cost, making B+ Tree the dominant index structure in databases.

6. Feeling the Power of B+ Trees

In InnoDB, B+ indexes typically have a height of 2–4 levels. Height depends on the order: larger order yields a shorter tree. Assuming each non‑leaf page stores 1 000 keys and each leaf page stores 100 keys, a 3‑level B+ Tree can hold 100 million records, whereas a binary tree would need about 26 levels for the same amount.

7. Summary

Summary of the problems each tree solves and the new issues they introduce:

BST: solves basic ordering but can become unbalanced.

AVL: balances via rotations but rotation overhead is high.

Red‑Black: relaxes balance, improves deletion, yet remains too tall for disk.

B‑Tree: multi‑way structure reduces height for disk storage.

B+ Tree: internal nodes are pure indexes, leaves linked, further lowering height and enhancing range queries.

data structuresDatabase PerformanceB-TreeMySQL IndexingTree Indexes
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.