Databases 23 min read

Master MySQL Indexes: From B+Tree Basics to Advanced Optimization

This article explains MySQL index fundamentals, covering B+Tree structures, various index types, creation syntax, and practical optimization techniques such as prefix, covering, and composite indexes, while illustrating how to avoid common pitfalls like index misuse and performance‑degrading scenarios.

Ops Development Stories

Jun 1, 2022

Master MySQL Indexes: From B+Tree Basics to Advanced Optimization

Hello, I am Xiao Lin. In interviews, MySQL index questions usually start from basic principles and then move to usage scenarios.

What Is an Index?

Just as you would use a book's table of contents to quickly locate a topic, a database index acts as a directory that lets the storage engine retrieve rows faster, trading space for time.

In MySQL, an index is a data structure that helps the storage engine locate rows quickly. MySQL storage engines include MyISAM, InnoDB, and Memory, with InnoDB being the default since MySQL 5.5.

Index Classification

Indexes can be grouped from four perspectives:

By data structure: B+Tree, Hash, Full‑text.

By physical storage: clustered (primary) index, secondary index.

By field characteristics: primary, unique, ordinary, prefix.

By number of columns: single‑column index, composite (multi‑column) index.

By Data Structure

MySQL commonly uses B+Tree, Hash, and Full‑text indexes. Different storage engines support different types. InnoDB, the default engine, primarily uses B+Tree.

When creating a table, InnoDB chooses the index key as follows:

If a primary key exists, it becomes the clustered index key.

If no primary key, the first NOT NULL unique column becomes the clustered index key.

If neither exists, InnoDB generates an implicit auto‑increment ID as the clustered index key.

All other indexes are secondary (non‑clustered) indexes. Both primary and secondary indexes use B+Tree by default.

Example: B+Tree Storage

Creating a product table with an integer primary key:

CREATE TABLE `product` (
  `id` int(11) NOT NULL,
  `product_no` varchar(20) DEFAULT NULL,
  `name` varchar(255) DEFAULT NULL,
  `price` decimal(10,2) DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE
) CHARACTER SET utf8 COLLATE utf8_general_ci ROW_FORMAT=Dynamic;

The rows are stored in a B+Tree where leaf nodes contain the actual data ordered by the primary key. Each parent node’s keys appear in its child nodes, and leaf nodes are linked together.

Querying by Primary Key

To find the product with id = 5, B+Tree traverses from root to leaf:

Compare 5 with root keys (1,10,20) → go to second level (1,4,7).

Compare 5 with second‑level keys (1,4,7) → go to third level (4,5,6).

Find 5 in leaf node and retrieve the row.

Each node lookup corresponds to one disk I/O, so this query needs three I/O operations.

B+Tree can locate data in millions of rows with only 3‑4 levels, keeping I/O low.

Querying by Secondary Index

Secondary indexes store the primary key value in leaf nodes. To find a row by a secondary key (e.g., product_no = '0002'):

Search the secondary B+Tree to locate the leaf node containing the primary key.

Use that primary key to search the primary B+Tree and fetch the full row.

This two‑step process is called “back‑to‑table” (回表). If the query only needs columns present in the secondary index, the engine can satisfy it directly—this is a “covering index”.

Why InnoDB Chooses B+Tree

Compared with B‑Tree, B+Tree stores data only in leaf nodes, making each node smaller and allowing more keys per I/O. Leaf nodes are linked, enabling efficient range scans.

Compared with a binary tree, B+Tree’s fan‑out (often >100) yields a height of only 3‑4 even for tens of millions of rows, giving O(log d N) search complexity versus O(log N) for binary trees.

Hash indexes provide O(1) lookups for equality queries but cannot handle range queries, making B+Tree more versatile.

By Physical Storage

Clustered (primary) index: leaf nodes store full rows.

Secondary index: leaf nodes store only the primary key.

By Field Characteristics

Primary index – built on the primary key column.

Unique index – enforces uniqueness, allows NULL.

Ordinary index – no uniqueness or primary constraints.

Prefix index – indexes only the first N characters of a string column.

By Number of Columns

Single‑column index – e.g., primary key.

Composite (multi‑column) index – e.g., (product_no, name).

Composite Index Example

Creating a composite index on (product_no, name):

CREATE INDEX index_product_no_name ON product(product_no, name);

The B+Tree orders first by product_no, then by name. Queries must follow the leftmost principle to use the index efficiently.

When to Create or Not Create an Index?

Indexes improve query speed but have drawbacks: they consume storage, increase write overhead, and can slow down INSERT/UPDATE/DELETE operations.

Create Index When

The column has a uniqueness constraint (e.g., product code).

The column is frequently used in WHERE clauses.

The column is often used in GROUP BY or ORDER BY.

Do Not Create Index When

The column is not used for filtering, grouping, or ordering.

The column has low cardinality (many duplicate values, e.g., gender).

The table is very small.

The column is frequently updated (e.g., user balance).

Index Optimization Techniques

Prefix index optimization.

Covering index optimization.

Use auto‑increment primary keys.

Prevent index loss.

Prefix Index

Indexes only the first N characters of a string to reduce index size, but cannot be used for ORDER BY or as covering indexes.

Covering Index

If all columns needed by a query are present in the secondary index, the engine can retrieve results without accessing the primary index, avoiding back‑to‑table.

Auto‑Increment Primary Key

Sequential inserts avoid page splits, leading to higher insert performance and less fragmentation compared with random primary keys.

Prevent Index Loss

Common scenarios that cause index loss:

LIKE patterns with a leading wildcard (e.g., LIKE '%xx' or LIKE '%xx%').

Functions, calculations, or type casts on indexed columns.

Violating the leftmost principle in composite indexes.

Using OR where only one side references an indexed column.

Indexed columns not declared NOT NULL.

Check the execution plan: possible_keys shows candidate indexes, key shows the actual index used (NULL means no index), key_len the index length, rows the rows examined, and type the access method (ALL, index, range, ref, eq_ref, const). Prefer range, ref, eq_ref, or const over ALL or index scans.

Summary

This article covered the principles, classifications, creation syntax, and optimization strategies for MySQL indexes, providing a concise reference for designing efficient indexing schemes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL mysql database optimization index B+Tree

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.