Master MySQL Indexes: From B+ Trees to Index Merge and Optimization
This article explains MySQL InnoDB index fundamentals, covering index types, data structures, clustered and secondary indexes, covering indexes, index pushdown, index merge, cost‑based index selection, common index‑inefficiency scenarios, and practical guidelines for designing effective indexes.
Index Classification
Indexes can be classified from different dimensions.
1. By Data Structure
B+Tree index
Hash index
...
2. By Physical Storage Structure
Clustered index
Non‑clustered index (secondary index)
Clustered and non‑clustered indexes will be discussed in detail later.
3. By Index Characteristics
Primary key index
Unique index
Normal index
Full‑text index
...
4. By Number of Columns
Single‑column index
Composite (multi‑column) index
Index Data Structures
Preparation
For the examples a user table is created:
CREATE TABLE `user` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`age` int(10) DEFAULT NULL,
`city` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;Hash Index
Hash indexes are rarely used because InnoDB does not support explicitly creating them; it only provides adaptive hash indexes.
Even if a Hash index is declared, it is ineffective and the engine still uses a B+Tree.
Memory engine supports true Hash indexes. A Hash index works like a Java HashMap: the indexed column value is hashed, the hash determines a slot, and the slot stores a pointer to the row.
When multiple rows have the same hash value, a linked list is formed in the slot.
Advantages and disadvantages of Hash indexes
Very fast for equality comparisons.
Cannot be used for range queries or sorting because the values are unordered.
B+Tree
B+Tree is the most common index structure in MySQL. Other index types such as full‑text are omitted.
Besides Hash and B+Tree, there are other indexes like full‑text, which are not covered here.
Clustered Index
Data Page Storage
InnoDB stores rows in 16KB pages. When rows are inserted, they are placed into pages and sorted by the primary key, forming a singly linked list of pages.
To locate a row within a single page, MySQL builds a page directory that records the maximum id of each group (e.g., groups of 4 rows). A binary search on the directory quickly narrows the target group, then a linear scan inside the group finds the exact row.
Searching Across Multiple Pages
When a page is full, a new page is created and linked via a double‑linked list. MySQL extracts the minimum id of each page and stores it together with the page number in a higher‑level page. This higher‑level page forms a B+Tree that points to the leaf pages containing the actual rows.
The B+Tree built from these page‑level entries is the clustered index.
Secondary (Non‑Clustered) Index
A secondary index is also a B+Tree, but its leaf nodes store only the indexed column values and the primary key id. Non‑leaf nodes also keep the page number of the corresponding data page.
Single‑Column Index
For a non‑unique index on name, the leaf nodes contain name values sorted, and duplicate name entries are ordered by id.
Composite Index
When an index is built on name and age, the leaf nodes store (name, age) pairs sorted first by name, then by age, and finally by id.
Summary of Clustered vs. Non‑Clustered
Clustered leaf nodes store all column values; non‑clustered leaf nodes store only indexed columns plus the primary key.
Clustered data is ordered by primary key; non‑clustered data is ordered by the indexed columns.
Clustered non‑leaf nodes keep primary key and page number; non‑clustered non‑leaf nodes keep indexed columns, primary key, and page number.
Back‑Table (Row Lookup)
When a query uses a secondary index, MySQL first finds the matching primary key(s) from the index, then retrieves the remaining columns from the clustered index (the “back‑table” step).
Covering Index
If the query only requests columns that are present in the index (e.g., SELECT id FROM user WHERE name='赵六'), the engine can satisfy the query directly from the index without a back‑table lookup, improving performance.
Avoid SELECT * when a covering index can be used.
Index Pushdown
From MySQL 5.6 onward, conditions on columns that are stored in the index (e.g., age in a composite index) can be evaluated directly in the index, reducing the number of back‑table operations.
Index Merge
MySQL can combine results from multiple indexes:
Intersect (AND)
Union (OR)
Sort‑union (OR with unsorted results)
For intersect, the primary keys from each index must be ordered; otherwise MySQL falls back to a single index.
How MySQL Chooses an Index
MySQL estimates the cost of each possible access path using statistics:
IO cost: loading a page costs 1.0.
CPU cost: evaluating a row costs 0.2.
Full‑table scan cost ≈ rows * 0.2 + (data_length/1024/16) * 1.0.
Secondary‑index‑plus‑back‑table cost ≈ intervals * 1.0 + rows * 0.2 + rows * 1.0 + rows * 0.2.
The plan with the lowest total cost is chosen.
Index Failure Scenarios
Violating the left‑most prefix rule (e.g., LIKE '%foo' or using the second column of a composite index alone).
Applying functions or expressions to indexed columns.
Implicit type conversion that forces the column to be cast (e.g., comparing a VARCHAR column to a numeric literal).
Stale statistics leading to inaccurate cost estimates; running ANALYZE TABLE refreshes them.
Index Design Principles
Limit the number of indexes per table to avoid excessive storage and cost‑estimation overhead.
Create indexes on columns frequently used in WHERE clauses.
Consider indexing columns used in ORDER BY or GROUP BY to enable sorting without extra work.
Avoid indexing columns that are updated very frequently, as index maintenance becomes costly.
Prefer columns with high cardinality (high selectivity) for indexing.
Conclusion
The article covered clustered and non‑clustered indexes, optimization techniques such as covering indexes and index pushdown, the cost‑based index selection process, common reasons for index loss of effectiveness, and practical guidelines for creating efficient indexes.
Sample data used in the article:
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (1,'李四',20,'杭州');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (2,'张三',18,'北京');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (3,'张三',23,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (4,'赵六',22,'杭州');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (5,'王五',19,'北京');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (6,'赵六',24,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (7,'刘七',20,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (8,'刘七',22,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (9,'王九',9,'杭州');Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
