Databases 26 min read

Master MySQL Indexes: From B‑Tree to Hash and When to Use Them

This article explains MySQL indexing fundamentals, covering index types, underlying data structures such as B‑Tree and Hash, the differences between clustered and non‑clustered indexes, practical performance tips, and advanced features like adaptive hash indexes and index condition pushdown.

Java Interview Crash Guide

May 10, 2021

Master MySQL Indexes: From B‑Tree to Hash and When to Use Them

Introduction

Because MySQL’s default storage engine is InnoDB, this article focuses on indexes under InnoDB while briefly mentioning other engines, aiming to give a clear understanding of MySQL indexes.

Index Overview

Key questions addressed include: what an index is, what types can be created, which columns are suitable, whether more indexes are always better, why UUID or ID numbers are discouraged as primary keys, why SELECT * is discouraged, and how leading or trailing wildcards affect index usage.

What Is an Index

In a relational database, an index is a separate physical structure that stores sorted values of one or more columns together with pointers to the corresponding data pages, similar to a book’s table of contents.

MySQL Index Types

Ordinary Index

Basic index without special properties; can be created on any column.

-- 创建索引的基本语法
CREATE INDEX indexName ON table(column(length));
-- 例子 (length can be omitted)
CREATE INDEX idx_name ON user(name);

Primary Key Index

MySQL automatically creates a unique, non‑NULL index on the primary key column.

Composite (Combined) Index

Indexes that involve multiple columns, e.g., (phone, name). Uses the leftmost‑most principle.

-- 创建复合索引的基本语法
CREATE INDEX indexName ON table(column1(length), column2(length));
-- 例子
CREATE INDEX idx_phone_name ON user(phone, name);

Only queries that include the leftmost column(s) can use the composite index.

Full‑Text Index

Designed for keyword search in CHAR, VARCHAR, or TEXT columns, used with MATCH ... AGAINST rather than LIKE.

Spatial Index

Indexes on spatial data types ( GEOMETRY, POINT, LINESTRING, POLYGON) and can only be created on MyISAM tables with NOT NULL columns.

Index Data Structures

B+Tree

InnoDB’s default index structure. It stores keys in a balanced multi‑way tree, allowing fast lookups with minimal disk I/O.

Binary Search Tree

Illustrates the ideal insertion order; an unbalanced order degrades performance.

AVL Tree

A self‑balancing binary search tree that keeps the height difference of sub‑trees ≤ 1, using rotations to maintain balance.

AVL trees are not ideal for disk‑based indexes because each node would occupy an entire 16 KB page.

B‑Tree

Improves on AVL by storing many keys per node, reducing tree depth and I/O. A typical B‑Tree node can hold thousands of entries, making three‑level trees sufficient for billions of rows.

MySQL distinguishes between clustered (primary) and secondary (non‑clustered) indexes.

Clustered vs. Non‑Clustered Indexes

Clustered Index

The logical order of the key matches the physical order of rows; a table can have only one. In InnoDB, the clustered index stores the entire row, so the index is the data itself.

Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key.

Non‑Clustered Index

Stores only the indexed columns plus a pointer to the clustered index (usually the primary key). Queries that need other columns must “row‑lookup” (回表) using the primary key.

Example: avoid SELECT * FROM user WHERE name LIKE '张%' and instead use SELECT name FROM user WHERE name LIKE '张%' to prevent a costly row‑lookup.

Hash Index

Stores a hash of the indexed column in a hash table with linked lists for collisions. Only equality (=, IN, <>) queries can use it; range queries, ordering, and partial key lookups are unsupported.

InnoDB does not support user‑defined hash indexes; only the MEMORY/NDB engines do. InnoDB provides an adaptive hash index that is built automatically for frequently accessed pages.

The adaptive hash index feature enables InnoDB to perform more like an in‑memory database on systems with appropriate workload and sufficient memory without sacrificing transactional features or reliability.

Adaptive hash indexes reside in the buffer pool and are useful only when the working set fits in memory.

Common Pitfalls & Tips

Creating too many indexes increases storage and write overhead due to page splits and merges.

Use auto‑increment integer primary keys instead of UUIDs or ID numbers; the latter cause random inserts, page splits, and larger index size.

Only add indexes to columns with high selectivity. A simple heuristic is count(distinct(column)) / count(*).

Practical Example: Index on Gender

A table with 3 million rows was tested with and without an index on the gender column. Without the index the query took ~1 s; with the index it took ~22 s because MySQL first reads the index, then performs a row‑lookup for each matching primary key.

LIKE with Leading Wildcard

When a composite index has phone as the first column, a query like WHERE name LIKE '%张' AND phone='13204776301' can still use the index because the optimizer can push the phone condition down.

EXPLAIN SELECT * FROM user_innodb WHERE name LIKE '%张' AND phone = '13204776301';

The plan shows Using index condition (ICP), meaning the storage engine evaluates the name condition using the index without fetching the full row.

Index Condition Pushdown (ICP)

ICP pushes eligible WHERE conditions to the storage engine, reducing the number of full‑row reads. It works for range, ref, eq_ref, and ref_or_null access methods on InnoDB secondary indexes.

How InnoDB Builds a Clustered Index When No Primary Key Exists

If a primary key is defined, it becomes the clustered index.

If there is a unique NOT NULL index, InnoDB uses it as the clustered index.

Otherwise InnoDB creates a hidden clustered index GEN_CLUST_INDEX with an internally generated auto‑increment value.

These mechanisms ensure every InnoDB table has a clustered index for efficient data retrieval.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MySQL database optimization index B-Tree Hash Index

Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.