Master MySQL Indexes: From Balanced Binary Trees to B+ Trees
This article explains how MySQL uses B‑tree‑based indexes, covering balanced binary trees, B‑trees, B+‑trees, various index types, data‑type effects, and best practices for creating and using composite indexes to optimize query performance.
MySQL indexes are generally implemented with search trees, typically B‑trees; before discussing B‑trees, we review the prerequisite concept of balanced binary trees.
Balanced Binary Tree
Binary trees are familiar from introductory data‑structure courses. A binary search tree (BST) adds an "ordered" property: values on the left are less than the node, and values on the right are greater. However, a BST can degenerate into a linked list in the worst case, losing its tree benefits.
To address this, a balanced binary tree ensures that each node's left and right subtrees differ in height by at most one, combining the BST ordering with height balance.
Binary Search Tree
Every node's left or right subtree is also a balanced binary tree (height difference ≤ 1)
B‑Tree
Balanced binary trees provide good search complexity O(log N), but the height depends on the number of nodes. By allowing each node to have multiple children (a multi‑way tree), we can control the height more effectively; this leads to the B‑tree.
A B‑tree is an absolutely balanced tree where all leaf nodes reside at the same depth. Each node stores multiple keys and pointers, reducing disk I/O compared with binary trees; typically only 2–3 levels are needed for most workloads, making B‑tree indexes highly efficient.
B‑Tree Characteristics
Each node can have up to *m* children (order *m*).
The root must have at least two children.
Non‑root, non‑leaf nodes must have at least ⎡*m*/2⎤ children.
Node contents: [n, A0, K1, A1, K2, A2, …, Kn, An] where *n* is the number of keys, keys are ordered (K_i < K_{i+1}), and A_i are pointers.
All root‑to‑leaf paths have equal length; leaf nodes contain no data, only pointers indicating absence of a value.
B+‑Tree
While B‑trees improve efficiency, storing data in internal nodes can increase space usage and I/O when the dataset grows. B+‑trees address this by storing only keys in internal nodes; actual row data resides exclusively in leaf nodes, which are linked together via a doubly‑linked list for fast range scans.
Because all data lives in leaf nodes, each query must reach the leaf level, making disk I/O proportional to tree height. However, B+‑trees usually have a shorter height than B‑trees, and they support covering indexes where the index alone satisfies the query.
Index Classification
Logical Classification – By Function
Primary Index
A table can have only one primary index, defined with PRIMARY KEY . In MySQL this is typically implemented with a B‑Tree and is a special unique index.
ALTER TABLE TableName ADD PRIMARY KEY(column_list);Unique Index
Columns defined as UNIQUE INDEX must contain unique values but may allow NULLs. Multiple unique indexes can exist on a table.
CREATE UNIQUE INDEX IndexName ON `TableName`(`column`(length));
# or
ALTER TABLE TableName ADD UNIQUE (column_list);Normal Index
Normal indexes can contain multiple columns, allow duplicate values and NULLs.
CREATE INDEX IndexName ON `TableName`(`column`(length));
# or
ALTER TABLE TableName ADD INDEX IndexName(`column`(length));Logical Classification – By Usage
Single‑column indexes contain one column; composite indexes contain two or more columns and follow MySQL's "leftmost prefix" rule for optimal use.
-- Multi‑column composite (normal index)
CREATE INDEX <index_name> ON <table_name> (column1, column2, column3);
-- Multi‑column composite (unique index)
CREATE UNIQUE INDEX <index_name> ON <table_name> (column1, column2, column3);Physical Classification
The physical classification describes how the index is stored on disk, mainly concerning B+‑tree structures. Clustered Index: Data rows are stored in the leaf nodes of a B+‑tree built on the primary key. Each table can have only one clustered index.
Clustered Index
A clustered index (often the primary key) stores the entire row data in leaf nodes, making the index itself the data file. This is also known as a covering index.
Non‑Clustered Index
Also called secondary or auxiliary indexes, they store only the key and a pointer to the row data (the primary key). Queries first search the non‑clustered B+‑tree, retrieve the primary key, then locate the row via the clustered index.
Example query: SELECT sex, age FROM user WHERE name = '陈芳'; MySQL uses the name index to find the matching leaf, obtains the row ID, then uses the primary key index to fetch the full row – a process known as a "back‑lookup".
Both InnoDB and MyISAM store indexes as B+‑trees, but only InnoDB's primary key is clustered; all other indexes are non‑clustered.
Data Types and Index Behavior
VARCHAR
Prefix Index: Index only the first N characters to save space, at the cost of reduced selectivity.
Selectivity: Choose prefix length based on how distinct the leading characters are.
Index Efficiency: Equality and IN queries perform well; range queries may be slower, and using functions like CONCAT or SUBSTRING can invalidate the index.
Numeric Types
Selectivity: Integer columns usually have high selectivity, distributing values evenly.
Index Efficiency: Equality and range queries (>, <, BETWEEN) are fast; numeric indexes often outperform string indexes for range scans.
Datetime
Timezone Impact: Cross‑timezone data requires careful handling to avoid index misuse.
Function Impact: Applying functions like DATE_FORMAT can prevent index usage.
Index Efficiency: Equality and range queries on DATETIME are highly efficient.
Other Considerations
NULL values can affect index behavior, especially for VARCHAR columns.
Default values may reduce selectivity if many rows share the same default.
Frequent updates can fragment indexes, requiring periodic rebuilds.
Composite Index Usage Principles
MySQL follows the "leftmost prefix" rule: the index is used from left to right, and only columns that appear consecutively from the leftmost side can be leveraged.
Example: creating a composite index on source_ip, destination_ip, visit_time in that order.
ALTER TABLE tb_flow_visit ADD INDEX commIndies (sip, dip, visit_time);Order Matters: Queries must filter on sip first to use the index.
-- Effective (full match)
SELECT * FROM tb_flow_visit WHERE sip='192.168.1.1' AND dip='192.168.1.2' AND visit_time='2023-08-01';
-- Effective (partial match)
SELECT * FROM tb_flow_visit WHERE sip='192.168.1.1' AND dip='192.168.1.2';Cannot Skip: If the query starts with dip or visit_time without sip, the composite index is not used.
-- Invalid (skips sip)
SELECT * FROM tb_flow_visit WHERE dip='192.168.1.2' AND visit_time='2024-08-01';Partial Match: Using the leftmost column(s) allows partial index usage; columns to the right of a range condition cannot be used unless the leftmost column is an equality condition.
-- Valid partial match with range on rightmost column
SELECT * FROM tb_flow_visit WHERE sip='192.168.1.1' AND visit_time > '2024-08-01';
-- Invalid range on leftmost column prevents use of later columns
SELECT * FROM tb_flow_visit WHERE sip > '192.168.1.2' AND dip > '192.168.2.1' AND visit_time='2024-08-01';Understanding these principles helps design indexes that the MySQL optimizer can fully exploit.
Original source
(© Original author, all rights reserved)
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
