Master MySQL Indexes: From B+ Trees to Index Merge and Optimization
This article provides a comprehensive guide to MySQL indexing, covering index types, data structures, clustering vs. non‑clustering, hash and B+Tree indexes, covering indexes, index push‑down, index merge strategies, cost‑based index selection, common pitfalls, and practical indexing principles with illustrative examples and SQL snippets.
Hello everyone, I am SanYou~~
Today we’ll review common MySQL index concepts, focusing on the InnoDB storage engine.
Index Classification
Indexes can be classified from different dimensions.
1. By data structure
B+Tree index
Hash index
…
2. By physical storage
Clustered index
Non‑clustered index (secondary index)
Clustered and non‑clustered indexes will be discussed in detail later.
3. By index characteristics
Primary key index
Unique index
Normal index
Full‑text index
…
4. By column count
Single‑column index
Composite index
Index Data Structures
Preparation
For the examples we’ll use a user table:
CREATE TABLE `user` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`age` int(10) DEFAULT NULL,
`city` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;Hash Index
Hash indexes are rarely used because InnoDB does not support explicitly creating them; it only provides adaptive hash indexes.
Even if you declare a hash index, it will behave as a B+Tree:
Memory engine supports hash indexes. A hash index works like a Java HashMap: the indexed column value is hashed to a slot that stores the row pointer.
When multiple rows share the same hash value, a linked list is formed in that slot.
Advantages and disadvantages of hash indexes
Only equality comparisons are supported, giving very high query efficiency.
Range queries and sorting are not supported because the index entries are unordered.
B+Tree
B+Tree is the most commonly used index structure in MySQL. Other types such as full‑text indexes are omitted here.
Besides Hash and B+Tree, there are full‑text indexes and others, which are not discussed.
Clustered Index
Data Page Storage
InnoDB stores table rows in 16KB pages. When rows are inserted, they are placed in pages sorted by the primary key, forming a singly linked list.
Pages are grouped, and the maximum id of each group is stored in a page directory to enable binary search within a page.
Single‑Page Data Lookup
Instead of scanning every row, MySQL uses the page directory to locate the relevant group via binary search, then scans only that group.
Multi‑Page Data Lookup
When data exceeds one page, MySQL creates additional pages and links them with a doubly linked list. A separate page stores the minimum id of each data page together with its page number, forming a B+Tree.
Searching for a specific id now involves three steps: binary search the index page to find the data page, binary search within that data page, and finally retrieve the row.
This B+Tree where leaf nodes store the actual rows is the clustered index ; non‑leaf nodes store id and page numbers.
Secondary Index
A secondary (non‑clustered) index is also a B+Tree, but leaf nodes store only the indexed column values and the primary key id .
Single‑Column Index
For a non‑unique index on name, the leaf nodes contain the name values sorted, and duplicate name values are further ordered by id .
Composite Index
When indexing name and age, the index is first sorted by name, then by age, and finally by id .
Summary of Clustered vs. Non‑Clustered Indexes
Clustered index leaf nodes store all column values; non‑clustered leaf nodes store only indexed columns and the primary key.
Clustered index rows are ordered by id ; non‑clustered rows are ordered by the indexed column.
Clustered index non‑leaf nodes store id and page numbers; non‑clustered non‑leaf nodes store indexed columns, id , and page numbers.
Covering Index
If a query selects only columns that exist in the index (e.g., SELECT id FROM user WHERE name='赵六'), MySQL can satisfy the query using the index alone, avoiding a table lookup.
Avoid SELECT * when possible; covering indexes greatly improve performance.
Index Push‑Down
From MySQL 5.6 onward, conditions on non‑indexed columns (e.g., age > 22) can be evaluated directly from the secondary index, reducing the number of table lookups.
This optimization is called index push‑down because the predicate evaluation is pushed down to the index level.
Index Merge
MySQL can combine multiple indexes (e.g., idx_name and idx_age) using intersect, union, or sort‑union strategies.
Intersect
When both conditions are equality (e.g., name='赵六' AND age=22), MySQL retrieves primary key lists from each index, intersects them, and then performs a single table lookup.
Union
With an OR condition, MySQL retrieves primary key lists from each index, unions them (removing duplicates), and then looks up the rows.
Sort‑Union
If the primary key lists are unsorted, MySQL sorts them before performing the union.
How MySQL Chooses an Index
MySQL estimates the cost of each possible access path using IO cost (1.0 per page read) and CPU cost (0.2 per row evaluation). It compares the total cost of a full table scan, secondary index + table lookup, and other strategies, then selects the lowest‑cost plan.
Cost calculations are simplified here; real‑world scenarios involve many more factors.
Index Failure Scenarios
Violating the left‑most prefix rule (e.g., LIKE '%foo' or using the second column of a composite index alone).
Applying functions or expressions to indexed columns.
Implicit type conversion that forces the column to be cast (e.g., comparing a VARCHAR column to a numeric literal).
Stale or inaccurate statistics; running ANALYZE TABLE refreshes them.
Index Design Principles
Avoid creating too many indexes on a single table; each index consumes disk space and adds maintenance overhead.
Index columns frequently used in WHERE, ORDER BY, or GROUP BY clauses.
Do not index columns that are updated very often, as this increases write cost.
Prefer columns with high selectivity (high cardinality) for indexing.
Conclusion
This article covered clustered and non‑clustered indexes, optimization techniques such as covering indexes, index push‑down, and index merge, explained MySQL’s cost‑based index selection, discussed common index‑failure cases, and provided practical indexing guidelines.
Sample data used in the article:
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (1,'李四',20,'杭州');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (2,'张三',18,'北京');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (3,'张三',23,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (4,'赵六',22,'杭州');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (5,'王五',19,'北京');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (6,'赵六',24,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (7,'刘七',20,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (8,'刘七',22,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (9,'王九',9,'杭州');Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sanyou's Java Diary
Passionate about technology, though not great at solving problems; eager to share, never tire of learning!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
