Master MySQL Indexes: From B+ Trees to Index Merge and Optimization
This article explains MySQL index fundamentals—including classification, B+‑tree and hash structures, clustered and secondary indexes, single‑ and composite‑column indexes, covering indexes, index condition pushdown, index merge strategies, cost‑based index selection, common index‑invalidating scenarios, and practical guidelines for creating effective indexes.
Index Classification
Indexes can be classified by data structure (B+ tree, hash, …), physical storage (clustered vs. non‑clustered), characteristics (primary, unique, normal, full‑text, …), and column count (single‑column vs. composite).
Index Data Structures
For demonstration a user table is created:
CREATE TABLE `user` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`age` int(10) DEFAULT NULL,
`city` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;Hash Index
Hash indexes are rarely used because InnoDB does not support explicit hash indexes; it only provides adaptive hash indexes. When a hash index is defined on a column (e.g., name) the engine still stores a B+‑tree, so SHOW INDEX FROM table_name shows a B+‑tree.
Memory engine supports true hash indexes. A hash index works like a Java HashMap: the indexed column value is hashed, the hash value determines a slot, and each slot may contain a linked list of row pointers to resolve collisions.
Hash indexes are fast for equality comparisons.
They cannot be used for range queries or sorting because the values are unordered.
B+ Tree
B+‑tree is the most common index structure in MySQL. It stores keys in sorted order and uses leaf nodes to hold either the full row (clustered index) or the indexed columns plus the primary key (secondary index).
Other index types such as full‑text are omitted for brevity.
Clustered Index
In InnoDB, the primary key forms a clustered index. Data rows are stored in 16 KB pages; pages are linked in a doubly‑linked list and sorted by the primary key. To locate a row, MySQL first reads the page directory (which stores the maximum key of each group), performs a binary search to find the correct page, then scans within the page.
When multiple pages exist, MySQL extracts the minimum primary‑key value of each page into a separate index page (the page directory). A binary search on this directory quickly identifies the target data page, reducing the need for full scans.
Secondary (Non‑Clustered) Index
A secondary index is also a B+‑tree. Its leaf nodes store the indexed column values together with the primary‑key ID; non‑leaf nodes also store the data‑page number. To retrieve full row data, MySQL first finds the primary‑key ID via the secondary index and then “backs up” to the clustered index—a process called a back‑table lookup.
Single‑Column Index
Creating a normal (non‑unique) index on name stores the name values sorted, with duplicate name rows ordered by primary key. MySQL can then use binary search on the index to locate matching rows.
Composite Index
A composite index on name, age first sorts by name; for identical name values it sorts by age, and finally by primary key. This ordering enables efficient range scans and can satisfy queries that need both columns.
Back‑Table Lookup
For a query like SELECT * FROM user WHERE name='赵六', MySQL uses the secondary index to find the matching primary‑key IDs, then retrieves the remaining columns from the clustered index. If the index is non‑unique, multiple rows may be visited.
Covering Index
If a query requests only columns that exist in the index (e.g., SELECT id FROM user WHERE name='赵六'), MySQL can satisfy the query directly from the index without a back‑table lookup, improving performance.
Index Condition Pushdown
Starting with MySQL 5.6, conditions on non‑leading columns of a composite index (e.g., age>22 in an index on name, age) are evaluated directly on the index, reducing the number of back‑table lookups.
Index Merge
MySQL can combine multiple indexes for a single query. Three merge strategies exist:
Intersect – returns the intersection of IDs from each index.
Union – returns the union of IDs.
Sort‑union – sorts IDs before merging when they are not already ordered.
Merge requires that each individual index produce ordered primary‑key IDs; otherwise MySQL falls back to a full scan.
How MySQL Chooses an Index
MySQL estimates the cost of each access path using I/O cost (1.0 per page read) and CPU cost (0.2 per row evaluation). It calculates costs for full table scans, secondary‑index‑plus‑back‑table lookups, and index merges, then selects the lowest‑cost plan.
Index Invalidations
Common causes of index loss include:
Violating the left‑most prefix rule (e.g., LIKE '%foo').
Applying functions or expressions to indexed columns.
Implicit type conversion (e.g., comparing a varchar column to a numeric literal).
Stale statistics; running ANALYZE TABLE refreshes them.
Index Creation Principles
Avoid creating too many indexes per table; each index consumes disk space and adds maintenance overhead.
Index columns frequently used in WHERE, ORDER BY, or GROUP BY clauses.
Do not index columns that change often, as updates require index re‑ordering.
Prefer columns with high cardinality (high selectivity) for better filtering.
Conclusion
The article covered clustered vs. non‑clustered indexes, optimization techniques such as covering indexes and index condition pushdown, the cost‑based index selection process, typical index‑invalidating scenarios, and practical guidelines for designing effective MySQL indexes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
