Databases 29 min read

Master MySQL Indexes: From B+ Trees to Index Merge and Optimization

This article explains MySQL index fundamentals—including classification, B+‑tree and hash structures, clustered and secondary indexes, single‑ and composite‑column indexes, covering indexes, index condition pushdown, index merge strategies, cost‑based index selection, common index‑invalidating scenarios, and practical guidelines for creating effective indexes.

Su San Talks Tech

May 30, 2023

Master MySQL Indexes: From B+ Trees to Index Merge and Optimization

Index Classification

Indexes can be classified by data structure (B+ tree, hash, …), physical storage (clustered vs. non‑clustered), characteristics (primary, unique, normal, full‑text, …), and column count (single‑column vs. composite).

Index Data Structures

For demonstration a user table is created:

CREATE TABLE `user` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `age` int(10) DEFAULT NULL,
  `city` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Hash Index

Hash indexes are rarely used because InnoDB does not support explicit hash indexes; it only provides adaptive hash indexes. When a hash index is defined on a column (e.g., name) the engine still stores a B+‑tree, so SHOW INDEX FROM table_name shows a B+‑tree.

Memory engine supports true hash indexes. A hash index works like a Java HashMap: the indexed column value is hashed, the hash value determines a slot, and each slot may contain a linked list of row pointers to resolve collisions.

Hash indexes are fast for equality comparisons.

They cannot be used for range queries or sorting because the values are unordered.

B+ Tree

B+‑tree is the most common index structure in MySQL. It stores keys in sorted order and uses leaf nodes to hold either the full row (clustered index) or the indexed columns plus the primary key (secondary index).

Other index types such as full‑text are omitted for brevity.

Clustered Index

In InnoDB, the primary key forms a clustered index. Data rows are stored in 16 KB pages; pages are linked in a doubly‑linked list and sorted by the primary key. To locate a row, MySQL first reads the page directory (which stores the maximum key of each group), performs a binary search to find the correct page, then scans within the page.

When multiple pages exist, MySQL extracts the minimum primary‑key value of each page into a separate index page (the page directory). A binary search on this directory quickly identifies the target data page, reducing the need for full scans.

Secondary (Non‑Clustered) Index

A secondary index is also a B+‑tree. Its leaf nodes store the indexed column values together with the primary‑key ID; non‑leaf nodes also store the data‑page number. To retrieve full row data, MySQL first finds the primary‑key ID via the secondary index and then “backs up” to the clustered index—a process called a back‑table lookup.

Single‑Column Index

Creating a normal (non‑unique) index on name stores the name values sorted, with duplicate name rows ordered by primary key. MySQL can then use binary search on the index to locate matching rows.

Composite Index

A composite index on name, age first sorts by name; for identical name values it sorts by age, and finally by primary key. This ordering enables efficient range scans and can satisfy queries that need both columns.

Back‑Table Lookup

For a query like SELECT * FROM user WHERE name='赵六', MySQL uses the secondary index to find the matching primary‑key IDs, then retrieves the remaining columns from the clustered index. If the index is non‑unique, multiple rows may be visited.

Covering Index

If a query requests only columns that exist in the index (e.g., SELECT id FROM user WHERE name='赵六'), MySQL can satisfy the query directly from the index without a back‑table lookup, improving performance.

Index Condition Pushdown

Starting with MySQL 5.6, conditions on non‑leading columns of a composite index (e.g., age>22 in an index on name, age) are evaluated directly on the index, reducing the number of back‑table lookups.

Index Merge

MySQL can combine multiple indexes for a single query. Three merge strategies exist:

Intersect – returns the intersection of IDs from each index.

Union – returns the union of IDs.

Sort‑union – sorts IDs before merging when they are not already ordered.

Merge requires that each individual index produce ordered primary‑key IDs; otherwise MySQL falls back to a full scan.

How MySQL Chooses an Index

MySQL estimates the cost of each access path using I/O cost (1.0 per page read) and CPU cost (0.2 per row evaluation). It calculates costs for full table scans, secondary‑index‑plus‑back‑table lookups, and index merges, then selects the lowest‑cost plan.

Index Invalidations

Common causes of index loss include:

Violating the left‑most prefix rule (e.g., LIKE '%foo').

Applying functions or expressions to indexed columns.

Implicit type conversion (e.g., comparing a varchar column to a numeric literal).

Stale statistics; running ANALYZE TABLE refreshes them.

Index Creation Principles

Avoid creating too many indexes per table; each index consumes disk space and adds maintenance overhead.

Index columns frequently used in WHERE, ORDER BY, or GROUP BY clauses.

Do not index columns that change often, as updates require index re‑ordering.

Prefer columns with high cardinality (high selectivity) for better filtering.

Conclusion

The article covered clustered vs. non‑clustered indexes, optimization techniques such as covering indexes and index condition pushdown, the cost‑based index selection process, typical index‑invalidating scenarios, and practical guidelines for designing effective MySQL indexes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

InnoDB MySQL sql-optimization index Database Performance B+Tree

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.