Databases 31 min read

Master MySQL Indexes: From B+ Trees to Index Merge and Optimization

This article provides a comprehensive guide to MySQL indexing, covering index types, data structures, clustering vs. non‑clustering, hash and B+Tree indexes, covering indexes, index push‑down, index merge strategies, cost‑based index selection, common pitfalls, and practical indexing principles with illustrative examples and SQL snippets.

Sanyou's Java Diary

May 23, 2023

Master MySQL Indexes: From B+ Trees to Index Merge and Optimization

Hello everyone, I am SanYou~~

Today we’ll review common MySQL index concepts, focusing on the InnoDB storage engine.

Index Classification

Indexes can be classified from different dimensions.

1. By data structure

B+Tree index

Hash index

…

2. By physical storage

Clustered index

Non‑clustered index (secondary index)

Clustered and non‑clustered indexes will be discussed in detail later.

3. By index characteristics

Primary key index

Unique index

Normal index

Full‑text index

…

4. By column count

Single‑column index

Composite index

Index Data Structures

Preparation

For the examples we’ll use a user table:

CREATE TABLE `user` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `age` int(10) DEFAULT NULL,
  `city` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Hash Index

Hash indexes are rarely used because InnoDB does not support explicitly creating them; it only provides adaptive hash indexes.

Even if you declare a hash index, it will behave as a B+Tree:

Memory engine supports hash indexes. A hash index works like a Java HashMap: the indexed column value is hashed to a slot that stores the row pointer.

When multiple rows share the same hash value, a linked list is formed in that slot.

Advantages and disadvantages of hash indexes

Only equality comparisons are supported, giving very high query efficiency.

Range queries and sorting are not supported because the index entries are unordered.

B+Tree

B+Tree is the most commonly used index structure in MySQL. Other types such as full‑text indexes are omitted here.

Besides Hash and B+Tree, there are full‑text indexes and others, which are not discussed.

Clustered Index

Data Page Storage

InnoDB stores table rows in 16KB pages. When rows are inserted, they are placed in pages sorted by the primary key, forming a singly linked list.

Pages are grouped, and the maximum id of each group is stored in a page directory to enable binary search within a page.

Single‑Page Data Lookup

Instead of scanning every row, MySQL uses the page directory to locate the relevant group via binary search, then scans only that group.

Multi‑Page Data Lookup

When data exceeds one page, MySQL creates additional pages and links them with a doubly linked list. A separate page stores the minimum id of each data page together with its page number, forming a B+Tree.

Searching for a specific id now involves three steps: binary search the index page to find the data page, binary search within that data page, and finally retrieve the row.

This B+Tree where leaf nodes store the actual rows is the clustered index ; non‑leaf nodes store id and page numbers.

Secondary Index

A secondary (non‑clustered) index is also a B+Tree, but leaf nodes store only the indexed column values and the primary key id .

Single‑Column Index

For a non‑unique index on name, the leaf nodes contain the name values sorted, and duplicate name values are further ordered by id .

Composite Index

When indexing name and age, the index is first sorted by name, then by age, and finally by id .

Summary of Clustered vs. Non‑Clustered Indexes

Clustered index leaf nodes store all column values; non‑clustered leaf nodes store only indexed columns and the primary key.

Clustered index rows are ordered by id ; non‑clustered rows are ordered by the indexed column.

Clustered index non‑leaf nodes store id and page numbers; non‑clustered non‑leaf nodes store indexed columns, id , and page numbers.

Covering Index

If a query selects only columns that exist in the index (e.g., SELECT id FROM user WHERE name='赵六'), MySQL can satisfy the query using the index alone, avoiding a table lookup.

Avoid SELECT * when possible; covering indexes greatly improve performance.

Index Push‑Down

From MySQL 5.6 onward, conditions on non‑indexed columns (e.g., age > 22) can be evaluated directly from the secondary index, reducing the number of table lookups.

This optimization is called index push‑down because the predicate evaluation is pushed down to the index level.

Index Merge

MySQL can combine multiple indexes (e.g., idx_name and idx_age) using intersect, union, or sort‑union strategies.

Intersect

When both conditions are equality (e.g., name='赵六' AND age=22), MySQL retrieves primary key lists from each index, intersects them, and then performs a single table lookup.

Union

With an OR condition, MySQL retrieves primary key lists from each index, unions them (removing duplicates), and then looks up the rows.

Sort‑Union

If the primary key lists are unsorted, MySQL sorts them before performing the union.

How MySQL Chooses an Index

MySQL estimates the cost of each possible access path using IO cost (1.0 per page read) and CPU cost (0.2 per row evaluation). It compares the total cost of a full table scan, secondary index + table lookup, and other strategies, then selects the lowest‑cost plan.

Cost calculations are simplified here; real‑world scenarios involve many more factors.

Index Failure Scenarios

Violating the left‑most prefix rule (e.g., LIKE '%foo' or using the second column of a composite index alone).

Applying functions or expressions to indexed columns.

Implicit type conversion that forces the column to be cast (e.g., comparing a VARCHAR column to a numeric literal).

Stale or inaccurate statistics; running ANALYZE TABLE refreshes them.

Index Design Principles

Avoid creating too many indexes on a single table; each index consumes disk space and adds maintenance overhead.

Index columns frequently used in WHERE, ORDER BY, or GROUP BY clauses.

Do not index columns that are updated very often, as this increases write cost.

Prefer columns with high selectivity (high cardinality) for indexing.

Conclusion

This article covered clustered and non‑clustered indexes, optimization techniques such as covering indexes, index push‑down, and index merge, explained MySQL’s cost‑based index selection, discussed common index‑failure cases, and provided practical indexing guidelines.

Sample data used in the article:

INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (1,'李四',20,'杭州');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (2,'张三',18,'北京');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (3,'张三',23,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (4,'赵六',22,'杭州');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (5,'王五',19,'北京');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (6,'赵六',24,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (7,'刘七',20,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (8,'刘七',22,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (9,'王九',9,'杭州');

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database Query Optimization InnoDB mysql index B+Tree

Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.