Why MySQL’s B+Tree Indexes Power High‑Performance Queries
This article explains how MySQL implements indexes with B+Tree structures, why they outperform full table scans, the internal layout of leaf and internal nodes, insertion and split mechanics, range‑query processing, and practical optimization tips for clustered and secondary indexes.
Preface
In the world of databases, an index works like a book's table of contents, allowing rapid location of data instead of scanning the entire table. MySQL implements this index using a B+Tree, which is the foundation of its high‑performance queries.
1. Why an Index Is Needed
Consider a users table with 10 million rows. A query such as
SELECT * FROM users WHERE username = 'john_doe';forces a full‑table scan with O(N) complexity, which is extremely inefficient on large data sets. An index reduces the lookup complexity to O(log N), potentially improving performance by hundreds of times.
2. From Binary Trees to B+Tree
Binary trees and AVL trees degrade to linear scans or cause excessive disk I/O because each node has only two children. B+Tree is a multi‑way balanced search tree designed to minimize disk I/O by increasing fan‑out and keeping the tree shallow.
B+Tree’s design goal: minimize the number of disk I/O operations.
2.1 Core Structure of B+Tree
Internal (index) nodes : store only index keys and pointers; act as a navigation system; high fan‑out keeps the tree short.
Leaf (data) nodes : store the actual row records; linked together by a doubly‑linked list to support efficient range scans.
3. B+Tree vs. B‑Tree
Data location : B‑Tree stores data in every node, while B+Tree stores data only in leaf nodes, giving a stable query path.
Leaf linking : B+Tree provides a doubly‑linked leaf list for fast range queries; B‑Tree does not.
Non‑leaf role : B‑Tree nodes contain both index and data; B+Tree non‑leaf nodes contain only index keys, resulting in higher fan‑out and fewer I/O operations.
Analogy: B‑Tree is like a book where chapters mix text and index; B+Tree is like a book with a pure index followed by neatly ordered content.
4. B+Tree and Disk Pages
In InnoDB, each B+Tree node corresponds to a 16 KB disk page that can hold hundreds of index keys. Even with billions of rows, the tree height stays around 3–4 levels.
Example: If each page stores 400 keys, a tree of height 3 can address 64 million rows with only three disk I/O operations.
5. Insertion and Page Split
Insertion Process
Locate the target leaf page.
If the page is not full, insert the record directly.
If the page is full, perform a page split and update the parent node.
Impact of Page Splits
Page splits increase tree height, so using an auto‑increment primary key (which always inserts at the rightmost leaf) reduces the frequency of splits. Random keys such as UUIDs cause frequent splits and degrade write performance.
6. Range Query Mechanics
B+Tree excels at range queries because all data resides in leaf nodes linked together.
SELECT * FROM users WHERE age BETWEEN 20 AND 30;Find the first leaf page that satisfies the condition.
Scan sequentially through the leaf linked list.
No repeated tree jumps, resulting in very high efficiency.
7. B+Tree Implementation in InnoDB
7.1 Clustered Index
The entire table is a B+Tree; leaf nodes store complete row records.
Only one clustered index per table, usually built on the primary key.
If no primary key exists, InnoDB creates a hidden RowID.
7.2 Secondary Index
Leaf nodes store the indexed column values plus the primary key.
Querying a secondary index requires a lookup in the index followed by a “row lookup” (back‑table) using the primary key.
-- Primary key lookup (single step)
SELECT * FROM users WHERE id = 5;
-- Secondary index lookup (index + back‑table)
SELECT * FROM users WHERE username = 'john_doe';8. Limitations and Common Pitfalls
Low‑selectivity columns : optimizer may ignore the index; consider composite indexes.
LIKE '%abc' : cannot use left‑most prefix; rewrite as LIKE 'abc%' or use full‑text index.
Functions on indexed columns : break index usage; rewrite as range conditions.
Too many indexes : increase write overhead; keep only necessary indexes.
9. Composite Index and Left‑most Prefix Rule
Given an index INDEX(a, b, c), it can be used for queries on a, a, b, a, b, c, or a BETWEEN …. It cannot be used when the leftmost column is omitted, e.g., queries on b or c alone.
Index ordering follows (a,b,c) ; missing the leftmost column prevents matching.
10. B+Tree vs. Hash Index
Structure : B+Tree is a multi‑way balanced tree; hash index is a hash table.
Range queries : supported by B+Tree, not by hash.
Sorting : supported by B+Tree, not by hash.
Lookup complexity : O(log N) for B+Tree, O(1) (ideal) for hash.
Use cases : B+Tree for general indexing; hash for equality lookups and high‑concurrency caches (Memory engine).
11. Performance Optimization Tips
Use short, auto‑increment primary keys to avoid page splits.
Choose high‑selectivity columns for indexes to improve filter effectiveness.
Order columns in composite indexes according to the left‑most prefix rule and query frequency.
Periodically drop redundant indexes to reduce maintenance cost.
Run EXPLAIN to verify index usage and identify back‑table operations.
12. Five Core Advantages of B+Tree Indexes
Very low disk I/O: only 3–4 tree levels cover tens of millions of rows.
Excellent range‑query performance thanks to the leaf linked list.
Stable query complexity: every lookup follows the same path length.
High space utilization: non‑leaf nodes store only keys.
Perfect fit for database workloads, being designed around disk‑page characteristics.
13. Understanding B+Tree Is Understanding MySQL Performance
Grasping B+Tree explains why auto‑increment primary keys are fast, why range queries beat hash indexes, and why SQL optimization essentially means traversing the shortest B+Tree path.
Ray's Galactic Tech
Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
