Understanding MySQL Indexes: Principles, Types, and Optimization Techniques
This article provides a comprehensive overview of MySQL indexes, explaining their underlying data structures, classification methods, practical usage scenarios, and optimization strategies, while illustrating concepts with diagrams and SQL examples to help developers design efficient query performance.
Hello, I am Xiaolin. In interviews, MySQL index questions usually start with the basic principles and then move to usage scenarios, such as:
What data structures and algorithms are used by the index?
Why does MySQL InnoDB choose B+Tree as the index structure?
When should an index be used?
When is it unnecessary to create an index?
In what situations does an index become ineffective?
How can indexes be optimized?
Today we will solidify the knowledge points of MySQL indexes.
What Is an Index?
Just as you would use a book's table of contents to quickly locate information, a database index acts as a directory that allows the storage engine to retrieve rows quickly, trading space for time.
In MySQL, an index is a data structure that helps the storage engine locate rows efficiently; it is essentially the "catalog" of the table.
The storage engine determines how data is stored, how indexes are built, and how updates and queries are performed. MySQL provides several storage engines, such as MyISAM, InnoDB, and Memory, with InnoDB being the default since MySQL 5.5.
Classification of Indexes
Indexes can be classified from four perspectives:
By data structure: B+Tree, Hash, Full‑text
By physical storage: Clustered (primary) index, Secondary (auxiliary) index
By field characteristics: Primary, Unique, Normal, Prefix
By number of columns: Single‑column, Composite (multi‑column) index
Classification by Data Structure
MySQL commonly uses B+Tree, HASH, and Full‑text indexes. Different storage engines support different index types. The table below summarizes the support for InnoDB, MyISAM, and Memory.
InnoDB, the default engine, uses B+Tree for most indexes. When creating a table, InnoDB chooses the primary key as the clustered index key based on the following rules:
If a primary key exists, it becomes the clustered index key.
If no primary key, the first NOT‑NULL unique column is used.
If neither exists, InnoDB generates an implicit auto‑increment column.
All other indexes are secondary indexes, which also use B+Tree by default.
Example: Creating a Product Table
<span style="color:#c678dd">CREATE</span> <span style="color:#c678dd">TABLE</span> `product` (
`id` int(11) <span style="color:#c678dd">NOT</span> <span style="color:#56b6c2">NULL</span>,
`product_no` varchar(20) <span style="color:#c678dd">DEFAULT</span> <span style="color:#56b6c2">NULL</span>,
`name` varchar(255) <span style="color:#c678dd">DEFAULT</span> <span style="color:#56b6c2">NULL</span>,
`price` decimal(10,2) <span style="color:#c678dd">DEFAULT</span> <span style="color:#56b6c2">NULL</span>,
PRIMARY <span style="color:#c678dd">KEY</span> (`id`) <span style="color:#c678dd">USING</span> BTREE
) CHARACTER SET utf8 COLLATE utf8_general_ci ROW_FORMAT=Dynamic;Assume the table contains the following rows:
In a B+Tree, only leaf nodes store the actual row data, while internal nodes store index keys. The leaf nodes are linked together, forming a doubly linked list.
Querying by Primary Key
To find the product with id = 5, the B+Tree is traversed from the root to the leaf, typically requiring 3 I/O operations for a table with millions of rows.
Compare 5 with root keys (1,10,20) → go to second level (1,4,7).
Compare 5 with second‑level keys (1,4,7) → go to third level (4,5,6).
Find 5 in the leaf node and retrieve the row.
Because the height of a B+Tree for tens of millions of rows is only 3‑4, the number of disk I/O operations remains low, giving B+Tree a clear performance advantage over B‑tree or binary tree structures.
Querying by Secondary Index
Suppose we create a secondary index on product_no. The secondary B+Tree stores the indexed column values in internal nodes and the primary key values in leaf nodes. To retrieve a row by product_no = '0002':
<span style="color:#c678dd">SELECT</span> * <span style="color:#c678dd">FROM</span> product <span style="color:#c678dd">WHERE</span> product_no = '0002';The engine first searches the secondary B+Tree to obtain the corresponding primary key, then uses the primary B+Tree to fetch the full row. This two‑step process is called “back‑lookup”. If the query only requests columns that exist in the secondary index (e.g., SELECT id FROM product WHERE product_no='0002'), the engine can satisfy the query directly from the secondary leaf nodes—this is known as a “covering index”.
Why InnoDB Chooses B+Tree
B+Tree stores data only in leaf nodes and links leaves together, which makes range queries efficient. Compared with B‑tree, B+Tree nodes are smaller, allowing more keys per node and thus fewer I/O operations. Compared with binary trees, B+Tree’s high fan‑out (often >100) keeps the tree height low, yielding O(log d N) search complexity, where d is the branching factor.
Hash indexes provide O(1) equality lookups but cannot support range queries, which is why B+Tree is preferred for most MySQL workloads.
Classification by Physical Storage
From a storage perspective, indexes are either clustered (primary) or secondary (auxiliary). The differences are:
Clustered index leaf nodes store the full row data.
Secondary index leaf nodes store only the primary key values.
Classification by Field Characteristics
Primary Index
A primary index is built on the primary key column, must be unique, and cannot contain NULL values. It is created automatically when defining the primary key:
<span style="color:#c678dd">CREATE</span> <span style="color:#c678dd">TABLE</span> table_name (
...,
PRIMARY <span style="color:#c678dd">KEY</span> (index_column_1) <span style="color:#c678dd">USING</span> BTREE
);Unique Index
A unique index enforces uniqueness but allows NULLs. It can be created during table creation or later:
<span style="color:#c678dd">CREATE</span> <span style="color:#c678dd">UNIQUE</span> <span style="color:#c678dd">KEY</span> (index_column_1, index_column_2, ...);Normal Index
A normal index is built on non‑unique, non‑primary columns:
<span style="color:#c678dd">CREATE</span> <span style="color:#c678dd">INDEX</span> (index_column_1, index_column_2, ...);Prefix Index
Prefix indexes store only the first N characters of a string column, reducing index size. They are defined as:
<span style="color:#c678dd">CREATE</span> <span style="color:#c678dd">TABLE</span> table_name (
...,
<span style="color:#c678dd">INDEX</span> (column_name(<span style="color:#c678dd">length</span>))
);Classification by Number of Columns
Composite Index
A composite index combines multiple columns, e.g., (product_no, name):
<span style="color:#c678dd">CREATE</span> <span style="color:#c678dd">INDEX</span> index_product_no_name <span style="color:#c678dd">ON</span> product(product_no, name);The B+Tree for a composite index stores the combined values as the key. Queries can use the index only if they follow the “left‑most” rule (e.g., conditions on a or a,b but not on b alone).
When to Create or Not Create an Index
Indexes improve query speed but have drawbacks:
Consume physical storage.
Increase time for index creation and maintenance, especially on large tables.
Slow down INSERT/UPDATE/DELETE because each change must also modify the index.
When an Index Is Beneficial
Columns with unique constraints (e.g., product codes).
Columns frequently used in WHERE clauses.
Columns used in GROUP BY or ORDER BY.
When an Index Is Unnecessary
Columns not used for filtering, grouping, or ordering.
Columns with low cardinality (e.g., gender).
Very small tables where a full scan is cheap.
Columns that are updated frequently (e.g., user balance).
Index Optimization Techniques
Prefix index optimization.
Covering index optimization.
Use auto‑increment primary keys.
Avoid index invalidation.
Prefix Index Optimization
Prefix indexes reduce index size for long string columns, improving lookup speed, but they cannot be used for ORDER BY or as covering indexes.
Covering Index Optimization
If a query only needs columns that exist in the index (e.g., name and price), a covering index eliminates the need for a back‑lookup, saving I/O.
Auto‑Increment Primary Keys
Using an auto‑increment primary key ensures new rows are appended to the end of the B+Tree, avoiding page splits and keeping inserts fast. Random primary keys cause page splits, increasing fragmentation and I/O.
Preventing Index Invalidations
Common causes of index invalidation include:
LIKE patterns with a leading wildcard (e.g., LIKE '%xx').
Functions or calculations on indexed columns.
Violating the left‑most rule on composite indexes.
OR conditions where only one side uses an indexed column.
Indexed columns not declared NOT NULL.
Use EXPLAIN to verify whether a query uses an index. The type column shows the scan method, ordered from worst to best: ALL, index, range, ref, eq_ref, const.
Conclusion
This article covered the principles, classifications, and practical usage of MySQL indexes, summarizing the key points in the table below.
Finished!
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
