Why MySQL Indexes Use B‑Tree, Not Hash: Understanding Index Data Structures
The article explains how MySQL indexes rely on ordered data structures—binary trees, red‑black trees, hash tables, and especially B‑Trees—to accelerate queries, compares their performance characteristics, illustrates how tree height affects I/O operations, and shows why B‑Tree is preferred for range searches.
1. Index Data Structure Overview
Indexes are ordered data structures that help MySQL retrieve rows efficiently.
Example table t with 7 rows; query select * from t where t.clo2 = 89.
If no index exists, MySQL performs a full table scan, reading each row and causing at least six disk I/O operations.
2. Common Index Structures
Binary Tree
Red‑Black Tree
Hash Table
B‑Tree
1. Binary Tree
Rule: left child < right child.
When the indexed column (e.g., Col1) is auto‑incrementing, inserts create a skewed tree similar to a linked list, increasing height.
In such a case, a lookup may need four node traversals.
In the worst case of sequential inserts, a binary‑tree index degenerates to a full scan.
2. Red‑Black Tree
Self‑balancing binary tree; rebalances when a branch exceeds three nodes.
Finding the value 6 requires six node visits in a plain binary tree but only three in a red‑black tree. MySQL does not use red‑black trees for indexes because the tree height can still become large with massive data sets.
With 5 million rows, a red‑black tree height may reach 23, leading to 23 disk I/O operations for a leaf‑node lookup, which is inefficient.
3. Hash Table
If an index were stored as a hash, MySQL would compute a hash value to locate the disk pointer directly, giving constant‑time lookups regardless of table size.
However, hash indexes cannot support range queries (e.g., select * from t where clo2 > 6) or ordering, and hash collisions, though rare, are possible.
Range queries and ORDER BY cannot use a hash index.
Hash collisions are handled internally but add complexity.
4. B‑Tree
All leaf nodes have the same depth; nodes store multiple sorted keys; pointers to child nodes are null at leaves; keys are unique.
With a maximum degree of 4, the tree shown requires only two node accesses to locate a row.
While B‑Tree lookups are fast, if a node’s page size (e.g., 16 KB) is filled with large row data, the number of keys per node drops, increasing tree height. Nevertheless, B‑Tree remains the preferred MySQL index because it supports both point and range queries efficiently.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Tech Hub
Sharing cutting-edge internet technologies and practical AI resources.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
