Databases 31 min read

Unlock MySQL Performance: Deep Dive into Index Types, B+ Trees, and Query Optimization

This article explains MySQL index classifications, the inner workings of B+‑tree and hash indexes, the differences between clustered and secondary indexes, and advanced concepts such as covering indexes, index push‑down, index merge, cost‑based index selection, and common pitfalls that cause index inefficiency.

Su San Talks Tech

Sep 11, 2024

Unlock MySQL Performance: Deep Dive into Index Types, B+ Trees, and Query Optimization

Index Classification

Indexes can be classified from different dimensions.

1. By data structure

B+ tree index

Hash index

...

2. By physical storage

Clustered index

Non‑clustered index (secondary index)

Clustered and non‑clustered indexes will be discussed in detail later.

3. By index characteristics

Primary key index

Unique index

Normal index

Full‑text index

...

4. By column count

Single‑column index

Composite index

Index Data Structure

Preparation

To illustrate the following content, a user table is prepared:

CREATE TABLE `user` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `age` int(10) DEFAULT NULL,
  `city` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Hash Index

Hash indexes are rarely used because InnoDB does not support explicitly creating hash indexes; it only provides adaptive hash indexes.

Even if a hash index is declared, it is ineffective:

Memory engine supports hash indexes. A hash index works like a Java HashMap: the indexed column value is hashed to a slot that stores a pointer to the row.

When multiple rows have the same hash value, a linked list is formed in that slot.

B+ Tree

B+ trees are the most commonly used index structure in MySQL. Other index types such as full‑text indexes are not covered here.

Besides hash and B+ trees, there are other index types like full‑text indexes.

Clustered Index

Data Page Storage

Inserted rows are persisted to disk. InnoDB divides data into pages (default 16 KB each) called data pages.

Rows are stored in pages sorted by the primary key, forming a singly linked list.

Single Data Page Lookup

To locate a row (e.g., id=2) MySQL can scan the page linearly, but this is inefficient for large pages.

MySQL groups rows in a page (e.g., groups of four rows) and stores the maximum id of each group in a page directory, enabling binary search on the directory.

Multiple Data Pages Lookup

When a page is full, a new page is created. Each page gets a page number and links to previous and next pages, forming a doubly linked list.

The minimum id of each page is stored in an auxiliary page, allowing binary search across pages.

Clustered Index

All rows are ultimately stored in a B+ tree; the leaf nodes contain the full row data, forming the clustered index.

Secondary Index

A secondary (non‑clustered) index is also a B+ tree, but its leaf nodes store only the indexed column values and the primary key id.

Single‑Column Index

For a normal index on name, the leaf nodes store name values sorted, with duplicate name values further ordered by id.

Composite Index

When an index is built on name and age, rows are first sorted by name, then by age, and finally by id.

Summary of Index Differences

Clustered index leaf nodes store all column values; secondary index leaf nodes store only indexed columns and the primary key.

Clustered index data is ordered by id; secondary index data is ordered by the indexed column(s).

Non‑leaf nodes of a clustered index store id and page numbers; secondary index non‑leaf nodes store indexed column values, id, and page numbers.

Table Lookup (回表)

When a secondary index is used (e.g., WHERE name='赵六'), MySQL first finds the matching id in the secondary index, then retrieves the full row from the clustered index using that id. This second step is called a table lookup.

Covering Index

If a query only needs columns that are present in the index (e.g., SELECT id FROM user WHERE name='赵六'), the table lookup is avoided because the required data is already in the index. This is a covering index and improves performance.

Index Push‑Down

For a composite index on name, age, MySQL can evaluate the age condition directly from the index without a table lookup, reducing the number of lookups dramatically.

Index Merge

MySQL can combine results from multiple indexes:

Intersect : takes the intersection of primary‑key sets from each index.

Union : takes the union of primary‑key sets.

Sort‑Union : sorts the sets before taking the union when they are not already ordered.

How MySQL Chooses an Index

MySQL estimates the cost of each possible execution plan. Costs consist of I/O (loading a page = 1.0) and CPU (evaluating a row = 0.2). It calculates costs for full table scans, secondary‑index lookups with possible table lookups, and chooses the plan with the lowest total cost.

Index Invalidations

Common reasons for index loss of effectiveness include:

Not matching the left‑most prefix rule (e.g., LIKE '%foo%').

Applying functions or expressions to indexed columns.

Implicit type conversion (e.g., comparing a VARCHAR column to a numeric literal).

Stale statistics; running ANALYZE TABLE refreshes them.

Index Building Principles

Avoid creating too many indexes on a single table; each index consumes disk space and adds overhead to write operations.

Create indexes on columns frequently used in WHERE clauses.

Columns used in ORDER BY or GROUP BY can benefit from indexes.

Do not index columns that are updated frequently, as this increases maintenance cost.

Prefer columns with high selectivity (high cardinality) for indexing.

Conclusion

This article covered MySQL index types, internal storage mechanisms, query execution strategies such as table lookup, covering indexes, index push‑down, index merge, cost‑based index selection, common pitfalls, and best practices for index design.

INSERT INTO `user` (`id`, `name`, `age`, `city`) VALUES
(1, '李四', 20, '杭州'),
(2, '张三', 18, '北京'),
(3, '张三', 23, '上海'),
(4, '赵六', 22, '杭州'),
(5, '王五', 19, '北京'),
(6, '赵六', 24, '上海'),
(7, '刘七', 20, '上海'),
(8, '刘七', 22, '上海'),
(9, '王九', 9, '杭州');

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database Query Optimization InnoDB MySQL B+Tree

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.