Why B‑Tree vs B+Tree Matters: MySQL Indexing Essentials
This article explains MySQL’s storage engines, compares B‑Tree and B+Tree indexes, details their structures and search processes, and outlines key principles for designing efficient indexes to avoid slow queries in relational databases.
1. Storage Engine Comparison
MySQL supports several storage engines, each with distinct characteristics. The most common index type is the B‑Tree, which is supported by all engines except Archive (which only gained indexing in MySQL 5.1 and only for a single AUTO_INCREMENT column). B‑Tree indexes dominate in MySQL and many other DBMS because their balanced tree structure provides excellent data‑retrieval performance.
2. B‑Tree and B+Tree Concepts
B‑Tree is a binary search tree where each non‑leaf node has at most two children, stores a single key, and directs searches left (smaller) or right (larger). In practice, MySQL’s B‑Tree implementation stores keys in leaf nodes, and the physical file follows a balanced‑tree layout where every leaf is at the same depth.
B+Tree is a multi‑way search tree derived from B‑Tree. Each internal node can have up to M children (M > 2). All keys are stored in leaf nodes, which are linked together in a dense, ordered list, while internal nodes contain only guide keys and pointers. This structure enables range scans and reduces tree height.
3. B+Tree Index Mechanics
A B+Tree index stores a “disk block” (page) that contains several data items and pointers. For example, a block may hold keys 17 and 35 with pointers P1 (values < 17), P2 (17 ≤ value < 35), and P3 (value ≥ 35). Real data resides only in leaf blocks; internal blocks hold only guide keys.
Search proceeds by loading the root block into memory (one I/O), performing a binary search to choose the correct pointer, loading the next block, and repeating until a leaf is reached. In a three‑level B+Tree, a lookup may require only three I/Os, whereas a full table scan would need one I/O per row, illustrating the massive performance gain.
4. Principles for Building Effective Indexes
Left‑most prefix rule: MySQL can use an index only for the leftmost columns until a range condition (>, <, BETWEEN, LIKE) stops the match.
= and IN order independence: Equality predicates can appear in any order; the optimizer rearranges them to fit the index.
High cardinality columns: Choose columns with a high distinct‑value ratio (count(distinct)/count) to reduce scanned rows; values with low cardinality (e.g., gender) are poor index candidates.
Avoid functions on indexed columns: Expressions like FROM_UNIXTIME(create_time) prevent index usage; rewrite as create_time = UNIX_TIMESTAMP('2014‑05‑29').
Extend existing indexes instead of creating new ones: If an index on column a exists, add column b to it rather than building a separate (a,b) index.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
