Master MySQL Indexes: Types, Usage, and Optimization Tips
This comprehensive guide explains what MySQL indexes are, details various index types—including B+‑tree, hash, full‑text, and spatial—covers when indexes become ineffective, shows how to design, use, and troubleshoot them, and provides practical steps for large‑scale index management.
Introduction
Prepared a 15‑question series to help developers understand MySQL indexes and improve their daily work.
What Is an Index?
An index is a data structure that speeds up database queries, similar to a dictionary’s table of contents.
Indexes are stored on disk and occupy physical space.
Appropriate indexing improves query performance, while excessive indexes can slow down INSERT and UPDATE operations.
MySQL Index Types
Data‑structure dimension
B+‑tree index: O(log n) complexity, suitable for range queries.
Hash index: ideal for equality queries, provides direct lookup.
Full‑text index: supported by MyISAM and InnoDB for CHAR, TEXT, VARCHAR columns.
R‑Tree index: used for GIS data types (SPATIAL).
Physical storage dimension
Clustered index: created on the primary key; leaf nodes store the full row (InnoDB).
Non‑clustered index: created on non‑primary columns; leaf nodes store the primary key and indexed columns (InnoDB).
Logical dimension
Primary key index: a special unique index that disallows NULL.
Normal index: basic MySQL index, allows NULL and duplicate values.
Composite (union) index: built on multiple columns, follows the left‑most prefix rule.
Unique index: values must be unique, but NULL is allowed.
Spatial index: supported from MySQL 5.7 following the OpenGIS model.
When Can an Index Fail?
Using OR in conditions may prevent index usage.
String literals in WHERE without quotes cause failure.
Wildcard patterns with LIKE can invalidate the index.
For composite indexes, if the query does not start with the leftmost column, the index is ignored.
Applying MySQL built‑in functions on indexed columns disables the index.
Arithmetic operations ( + - * /) on indexed columns break the index.
Comparisons using !=, <>, or NOT IN may render the index ineffective.
Using IS NULL or IS NOT NULL on indexed columns can cause failure.
Mismatched character sets in join conditions may invalidate the index.
If the optimizer estimates a full table scan to be faster, it will ignore the index.
Scenarios Unsuitable for Indexes
Tables with very small data volumes.
Highly frequently updated tables.
Columns with low selectivity (e.g., gender).
Columns used only in WHERE, GROUP BY, or ORDER BY that are not selected later.
Redundant indexes (e.g., having both (a,b) and a separate a index).
Why B+‑Tree Instead of a Binary Tree?
Consider query speed, stability, storage size, and disk I/O. B+‑tree stores only keys in internal nodes, allowing more keys per node, reducing tree height and disk reads compared to binary or balanced binary trees.
Binary trees can degenerate to linked lists, causing full‑table scans. Balanced binary trees store one key per node, leading to more disk reads. B+‑tree’s leaf‑only data storage makes range queries and ordered scans efficient.
B+‑Tree Index Lookup Process
Assume the following table structure and sample data.
CREATE TABLE `employee` (
`id` int(11) NOT NULL,
`name` varchar(255) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
`date` datetime DEFAULT NULL,
`sex` int(1) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_age` (`age`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO employee VALUES
(100,'小伦',43,'2021-01-20',0),
(200,'俊杰',48,'2021-01-21',0),
(300,'紫琪',36,'2020-01-21',1),
(400,'立红',32,'2020-01-21',0),
(500,'易迅',37,'2020-01-21',1),
(600,'小军',49,'2021-01-21',0),
(700,'小燕',28,'2021-01-21',1);Query: SELECT * FROM employee WHERE age=32; Steps:
Search the idx_age B+‑tree, load the first disk block, follow the left branch because 32 < 43, and load the second block.
Continue to the fourth block where the leaf node contains age=32 and retrieve id=400.
Use the primary‑key index on id to locate the full row, loading additional disk blocks as needed until the record is found.
What Is “Back‑to‑Table” (回表) and How to Reduce It?
When a query cannot retrieve all required columns from the index, MySQL must fetch the row from the primary‑key index—this is called “back‑to‑table”. Selecting only indexed columns (e.g., SELECT id, age FROM employee WHERE age=32) avoids the extra lookup, creating a covering index.
Covering index means the SELECTed columns are fully contained in the index leaf nodes, eliminating the need for a back‑to‑table operation.
Left‑most Prefix Principle
For a composite index (a,b,c), MySQL can use the index as if three separate indexes (a), (a,b), and (a,b,c) exist, greatly improving reuse.
The principle also applies to string indexes: the leftmost M characters of a string are indexed.
Index Push‑down (索引下推) Optimization
In MySQL 5.6 and later, the engine can filter rows while traversing the index, applying conditions such as age=28 directly on the index tree, reducing the number of rows that need a back‑to‑table lookup.
Adding Indexes to Very Large Tables
Create a new table B with the same structure as the original table A.
Add the desired indexes to B.
Copy data from A to B.
Rename B to A (using RENAME TABLE) and give the old A a temporary name.
How to Verify That a Query Uses an Index
Run EXPLAIN on the SQL statement. Key columns to examine:
type : shows join type (e.g., system, const, ref, range, index, ALL).
rows : estimated number of rows examined.
filtered : percentage of rows that pass the filter.
extra : additional info such as Using filesort, Using index (covering index), Using temporary, Using where, or Using index condition (push‑down).
key : the actual index used, compared with possible_keys.
Hash vs. B+‑Tree Indexes
Hash indexes cannot perform range queries; B+‑tree can.
B+‑tree supports the left‑most prefix rule for composite indexes; hash does not.
B+‑tree works with ORDER BY; hash does not.
Hash is faster for equality queries unless there are many collisions.
Like queries with a leading wildcard cannot use hash indexes.
Advantages and Disadvantages of Indexes
Advantages
Accelerate data retrieval and reduce query time.
Unique indexes enforce row uniqueness.
Disadvantages
Index creation and maintenance consume time.
Indexes occupy additional physical storage.
Insert, update, and delete operations must also modify the indexes.
Clustered vs. Non‑Clustered Indexes
A clustered index stores the data rows together with the index (leaf nodes contain the full row). A non‑clustered index stores only the indexed columns and a pointer to the primary key.
Only one clustered index per table (usually the primary key).
Non‑clustered indexes can be many, but typically require a back‑to‑table lookup.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
