Databases 31 min read

Master MySQL Indexes: From B+ Trees to Index Merge and Optimization

This article explains MySQL InnoDB index fundamentals, covering index types, data structures, clustered and secondary indexes, covering indexes, index pushdown, index merge, cost‑based index selection, common index‑inefficiency scenarios, and practical guidelines for designing effective indexes.

Su San Talks Tech

Jun 12, 2023

Master MySQL Indexes: From B+ Trees to Index Merge and Optimization

Index Classification

Indexes can be classified from different dimensions.

1. By Data Structure

B+Tree index

Hash index

...

2. By Physical Storage Structure

Clustered index

Non‑clustered index (secondary index)

Clustered and non‑clustered indexes will be discussed in detail later.

3. By Index Characteristics

Primary key index

Unique index

Normal index

Full‑text index

...

4. By Number of Columns

Single‑column index

Composite (multi‑column) index

Index Data Structures

Preparation

For the examples a user table is created:

CREATE TABLE `user` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `age` int(10) DEFAULT NULL,
  `city` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Hash Index

Hash indexes are rarely used because InnoDB does not support explicitly creating them; it only provides adaptive hash indexes.

Even if a Hash index is declared, it is ineffective and the engine still uses a B+Tree.

Memory engine supports true Hash indexes. A Hash index works like a Java HashMap: the indexed column value is hashed, the hash determines a slot, and the slot stores a pointer to the row.

When multiple rows have the same hash value, a linked list is formed in the slot.

Advantages and disadvantages of Hash indexes

Very fast for equality comparisons.

Cannot be used for range queries or sorting because the values are unordered.

B+Tree

B+Tree is the most common index structure in MySQL. Other index types such as full‑text are omitted.

Besides Hash and B+Tree, there are other indexes like full‑text, which are not covered here.

Clustered Index

Data Page Storage

InnoDB stores rows in 16KB pages. When rows are inserted, they are placed into pages and sorted by the primary key, forming a singly linked list of pages.

To locate a row within a single page, MySQL builds a page directory that records the maximum id of each group (e.g., groups of 4 rows). A binary search on the directory quickly narrows the target group, then a linear scan inside the group finds the exact row.

Searching Across Multiple Pages

When a page is full, a new page is created and linked via a double‑linked list. MySQL extracts the minimum id of each page and stores it together with the page number in a higher‑level page. This higher‑level page forms a B+Tree that points to the leaf pages containing the actual rows.

The B+Tree built from these page‑level entries is the clustered index.

Secondary (Non‑Clustered) Index

A secondary index is also a B+Tree, but its leaf nodes store only the indexed column values and the primary key id. Non‑leaf nodes also keep the page number of the corresponding data page.

Single‑Column Index

For a non‑unique index on name, the leaf nodes contain name values sorted, and duplicate name entries are ordered by id.

Composite Index

When an index is built on name and age, the leaf nodes store (name, age) pairs sorted first by name, then by age, and finally by id.

Summary of Clustered vs. Non‑Clustered

Clustered leaf nodes store all column values; non‑clustered leaf nodes store only indexed columns plus the primary key.

Clustered data is ordered by primary key; non‑clustered data is ordered by the indexed columns.

Clustered non‑leaf nodes keep primary key and page number; non‑clustered non‑leaf nodes keep indexed columns, primary key, and page number.

Back‑Table (Row Lookup)

When a query uses a secondary index, MySQL first finds the matching primary key(s) from the index, then retrieves the remaining columns from the clustered index (the “back‑table” step).

Covering Index

If the query only requests columns that are present in the index (e.g., SELECT id FROM user WHERE name='赵六'), the engine can satisfy the query directly from the index without a back‑table lookup, improving performance.

Avoid SELECT * when a covering index can be used.

Index Pushdown

From MySQL 5.6 onward, conditions on columns that are stored in the index (e.g., age in a composite index) can be evaluated directly in the index, reducing the number of back‑table operations.

Index Merge

MySQL can combine results from multiple indexes:

Intersect (AND)

Union (OR)

Sort‑union (OR with unsorted results)

For intersect, the primary keys from each index must be ordered; otherwise MySQL falls back to a single index.

How MySQL Chooses an Index

MySQL estimates the cost of each possible access path using statistics:

IO cost: loading a page costs 1.0.

CPU cost: evaluating a row costs 0.2.

Full‑table scan cost ≈ rows * 0.2 + (data_length/1024/16) * 1.0.

Secondary‑index‑plus‑back‑table cost ≈ intervals * 1.0 + rows * 0.2 + rows * 1.0 + rows * 0.2.

The plan with the lowest total cost is chosen.

Index Failure Scenarios

Violating the left‑most prefix rule (e.g., LIKE '%foo' or using the second column of a composite index alone).

Applying functions or expressions to indexed columns.

Implicit type conversion that forces the column to be cast (e.g., comparing a VARCHAR column to a numeric literal).

Stale statistics leading to inaccurate cost estimates; running ANALYZE TABLE refreshes them.

Index Design Principles

Limit the number of indexes per table to avoid excessive storage and cost‑estimation overhead.

Create indexes on columns frequently used in WHERE clauses.

Consider indexing columns used in ORDER BY or GROUP BY to enable sorting without extra work.

Avoid indexing columns that are updated very frequently, as index maintenance becomes costly.

Prefer columns with high cardinality (high selectivity) for indexing.

Conclusion

The article covered clustered and non‑clustered indexes, optimization techniques such as covering indexes and index pushdown, the cost‑based index selection process, common reasons for index loss of effectiveness, and practical guidelines for creating efficient indexes.

Sample data used in the article:

INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (1,'李四',20,'杭州');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (2,'张三',18,'北京');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (3,'张三',23,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (4,'赵六',22,'杭州');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (5,'王五',19,'北京');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (6,'赵六',24,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (7,'刘七',20,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (8,'刘七',22,'上海');
INSERT INTO `user` (`id`,`name`,`age`,`city`) VALUES (9,'王九',9,'杭州');

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

InnoDB MySQL database optimization index B+Tree

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.