Mastering MySQL Indexes: 15 Essential Q&A to Boost Query Performance
This article provides a comprehensive 15‑question guide on MySQL indexes, covering their definition, various types across data‑structure, physical and logical dimensions, situations where indexes fail or are unsuitable, B+‑tree advantages, search process, covering indexes, left‑most prefix, index push‑down, adding indexes to huge tables, using EXPLAIN to verify index usage, hash versus B+‑tree differences, pros and cons, and the distinction between clustered and non‑clustered indexes.
1. What is an index?
An index is a data structure that improves database query efficiency, similar to a dictionary’s table of contents, allowing rapid location of records. Indexes are stored on disk and occupy physical space. Proper indexing speeds up queries, while excessive indexes can degrade insert and update performance.
2. Types of MySQL indexes
Data‑structure dimension
B+‑tree index: All data stored in leaf nodes, O(log n) complexity, suitable for range queries.
Hash index: Optimized for equality queries, provides one‑step retrieval.
Full‑text index: Supported by MyISAM and InnoDB for CHAR, VARCHAR, TEXT columns.
R‑Tree index: Used for GIS data to create spatial indexes.
Physical storage dimension
Clustered index: Built on the primary key; leaf nodes store the full table rows (InnoDB).
Non‑clustered index: Built on non‑primary keys; leaf nodes store only the primary key and indexed columns (InnoDB).
Logical dimension
Primary key index – unique, no NULLs.
Ordinary index – basic MySQL index, allows NULLs and duplicates.
Composite (union) index – multiple columns, follows the left‑most prefix rule.
Unique index – values must be unique but can be NULL.
Spatial index – supported from MySQL 5.7, follows OpenGIS geometry model.
3. When can an index become ineffective?
Using OR in the WHERE clause.
String columns without quotes.
LIKE patterns that start with a wildcard.
Composite index where the query does not start with the first column.
Applying MySQL functions on indexed columns.
Arithmetic operations on indexed columns.
Using !=, <, >, NOT IN, or other non‑equality operators.
IS NULL / IS NOT NULL on indexed columns.
Join conditions with mismatched character sets.
When the optimizer estimates a full table scan to be faster than using the index.
4. Scenarios unsuitable for adding an index
Tables with very small data volumes.
Highly frequently updated tables.
Low‑cardinality columns (e.g., gender).
Columns not used in WHERE, GROUP BY, ORDER BY.
Redundant indexes (e.g., an existing composite index already covers a single‑column index).
5. Why use a B+‑tree instead of a binary tree?
Binary trees can degenerate to linked lists, leading to full‑table scans. Balanced binary trees improve stability but still store only one key per node, causing more disk I/O. B+‑trees store many keys per node, reducing tree height and disk reads. Non‑leaf nodes contain only keys, while leaf nodes hold the full sorted data, making range, sorting, grouping, and deduplication queries efficient.
6. B+‑tree index search process (example)
CREATE TABLE `employee` (
`id` int(11) NOT NULL,
`name` varchar(255) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
`date` datetime DEFAULT NULL,
`sex` int(1) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_age` (`age`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO employee VALUES (100,'小伦',43,'2021-01-20',0);
INSERT INTO employee VALUES (200,'俊杰',48,'2021-01-21',0);
INSERT INTO employee VALUES (300,'紫琪',36,'2020-01-21',1);
INSERT INTO employee VALUES (400,'立红',32,'2020-01-21',0);
INSERT INTO employee VALUES (500,'易迅',37,'2020-01-21',1);
INSERT INTO employee VALUES (600,'小军',49,'2021-01-21',0);
INSERT INTO employee VALUES (700,'小燕',28,'2021-01-21',1);Query: SELECT * FROM employee WHERE age = 32; Load the first disk page of the idx_age B+‑tree into memory; 32 < 43, follow the left branch to page 2.
Load page 2; 32 < 36, follow the left branch to page 4.
Page 4 contains the leaf entry for age = 32, yielding id = 400.
Switch to the primary‑key B+‑tree, load its root page, and navigate to the leaf containing id = 400, which resides on page 8.
Page 8 holds the full row, completing the query.
Images illustrating the idx_age and primary‑key index structures are included in the original article.
7. What is a covering index and how to avoid back‑table lookups?
If the SELECT list contains only columns present in the index (e.g., SELECT id, age FROM employee WHERE age = 32), the engine can retrieve all needed data directly from the index leaf nodes, eliminating the need to “back‑track” to the primary table—a technique known as a covering index.
8. The left‑most prefix principle
For a composite index (a, b, c), MySQL can use the index for queries that filter on (a), (a, b), or (a, b, c). The principle also applies to string indexes, where the leftmost M characters can be used for prefix searches.
9. Index push‑down (index condition pushdown)
Before MySQL 5.6, a query like
SELECT * FROM employee WHERE name LIKE '小%' AND age = 28 AND sex = '0'would first locate all rows matching the name prefix via the idx_name_age index, retrieve their primary keys, then back‑track to filter age and sex. Starting with MySQL 5.6, the optimizer can evaluate age = 28 and sex = '0' while traversing the index, reducing the number of rows that need to be fetched from the table. This is reflected in the EXPLAIN output as Using index condition.
10. Adding indexes to massive tables
Create a new table B with the same structure as the original table A.
Add the desired indexes to table B.
Copy data from A to B.
Rename B to the original table name and rename A to a temporary name.
This approach avoids long‑running locks on the production table.
11. How to know if a statement uses an index?
Run EXPLAIN on the query. Key columns to examine include type, rows, filtered, extra, and key. The type column indicates the access method (e.g., ref, range, ALL). The extra column may show Using index (covering index) or Using index condition (push‑down).
12. Hash index vs. B+‑tree: when to choose which?
Hash indexes support only equality queries; B+‑trees support range queries.
B+‑trees honor the left‑most prefix rule for composite indexes; hash indexes do not.
B+‑trees can be used for ORDER BY; hash indexes cannot.
Hash indexes are faster for high‑cardinality equality searches, but suffer from collisions on low‑cardinality data.
LIKE queries with a leading wildcard cannot use hash indexes; B+‑trees can still benefit from prefix optimization.
13. Advantages and disadvantages of indexes
Advantages
Accelerate data retrieval and reduce query latency.
Unique indexes enforce data uniqueness.
Disadvantages
Index creation and maintenance consume time and resources.
Indexes occupy additional physical storage.
Data modifications (INSERT/UPDATE/DELETE) require index updates, potentially impacting write performance.
14. Clustered vs. non‑clustered indexes
Clustered indexes store the entire row in the leaf nodes (InnoDB primary key), resulting in faster lookups because no back‑table lookup is needed. Non‑clustered indexes store only the primary key in leaf nodes, so retrieving other columns requires a back‑table lookup. A table can have only one clustered index (usually the primary key), but multiple non‑clustered indexes. In MyISAM, both primary and secondary indexes are non‑clustered, with leaf nodes containing pointers to the actual row data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
