Master MySQL Indexes: Deep Dive into B+ Tree Structure and Optimization Strategies
This guide explains MySQL’s B+‑tree index architecture, compares clustered and secondary indexes, demonstrates covering and prefix indexes, outlines design principles such as high‑selectivity columns and the left‑most prefix rule, and shows how to use EXPLAIN and real‑world examples to diagnose and improve query performance.
1. B+ Tree Index Structure
1.1 Basic Structure
B+ tree is the default index data structure of MySQL InnoDB engine. Its characteristics are:
All data rows are stored in leaf nodes; non‑leaf nodes store only key values (index column values).
Leaf nodes are linked by a doubly‑linked list, supporting range queries.
The tree height is usually 3–4 levels, capable of handling tens of millions of rows.
Example of a B+ tree:
Root Node
[15, 30, 45]
/ | \
Leaf1 Leaf2 Leaf3
[1‑14] [16‑29] [31‑44,46‑...]1.2 Why Choose B+ Tree?
Compared with B‑tree : non‑leaf nodes do not store data, allowing more index keys and a shorter, wider tree.
Compared with hash index : supports range queries, sorting, and fuzzy matching.
Compared with binary tree : avoids degeneration into a linked list, resulting in more stable I/O counts.
2. Clustered and Secondary Indexes
2.1 Clustered Index
Physical storage order of rows matches the index order.
Each table can have only one clustered index.
In InnoDB, the primary key is the clustered index.
If no primary key, the first UNIQUE NOT NULL column is chosen.
If none exist, InnoDB creates an implicit 6‑byte ROWID as the clustered index.
Advantages:
Efficient range queries because data are stored contiguously.
Fast data access since the index and the row data are together.
2.2 Secondary (Non‑Clustered) Index
Index nodes store the primary key value, not the row address.
Requires a “back‑lookup”: first retrieve the primary key from the secondary index, then fetch the full row via the clustered index.
A table can have multiple secondary indexes.
Back‑lookup example:
-- name column has a secondary index
SELECT * FROM users WHERE name = '张三';
-- 1. Find the primary key ID for '张三' in the name index tree.
-- 2. Use the ID to locate the full row in the clustered index tree.2.3 Covering Index
Optimization strategy:
-- Create covering index
CREATE INDEX idx_name_age ON users(name, age);
-- Query that can be satisfied by the index only (no back‑lookup)
SELECT name, age FROM users WHERE name = '张三';
-- Query that still needs a back‑lookup because address is not in the index
SELECT name, age, address FROM users WHERE name = '张三';3. Index Design
3.1 Design Principles
1. Choose appropriate index columns
-- Prefer high‑selectivity columns
-- Selectivity = distinct values / total rows
SELECT
COUNT(DISTINCT gender) / COUNT(*) AS gender_selectivity, -- low selectivity
COUNT(DISTINCT email) / COUNT(*) AS email_selectivity -- high selectivity
FROM users;2. Composite index design – left‑most prefix rule
-- Create composite index
CREATE INDEX idx_a_b_c ON table1(a, b, c);
-- Queries that can use the index:
WHERE a = 1
WHERE a = 1 AND b = 2
WHERE a = 1 AND b = 2 AND c = 3
WHERE a = 1 AND c = 3 -- uses only column a
-- Queries that cannot use the index:
WHERE b = 2
WHERE c = 3
WHERE b = 2 AND c = 33. Avoid index‑invalidating scenarios
-- 1. Functions on indexed columns cause loss
WHERE YEAR(create_time) = 2023 -- ❌ index lost
WHERE create_time >= '2023-01-01' AND create_time < '2024-01-01' -- ✅ index used
-- 2. Implicit type conversion
WHERE phone = 13800138000 -- ❌ phone is VARCHAR
WHERE phone = '13800138000' -- ✅
-- 3. OR condition (unless all columns are indexed)
WHERE a = 1 OR b = 2 -- ❌ if b has no index
WHERE a = 1
UNION
WHERE b = 2 -- ✅ rewrite as UNION
-- 4. LIKE with leading wildcard
WHERE name LIKE '%张%' -- ❌ cannot use index
WHERE name LIKE '张%' -- ✅ can use index3.2 Optimization Strategies
1. Prefix index for long text columns
-- Create a prefix index on the first 10 characters of email
CREATE INDEX idx_email_prefix ON users(email(10));
-- Evaluate suitable prefix length
SELECT
COUNT(DISTINCT LEFT(email, 5)) / COUNT(*) AS prefix_5,
COUNT(DISTINCT LEFT(email,10)) / COUNT(*) AS prefix_10,
COUNT(DISTINCT email) / COUNT(*) AS full_column
FROM users;2. Index merge optimization
-- Create single‑column indexes
CREATE INDEX idx_name ON users(name);
CREATE INDEX idx_age ON users(age);
-- Query that may trigger index merge
SELECT * FROM users WHERE name = '张三' OR age = 25;3. Regular index maintenance
-- Show index statistics
SHOW INDEX FROM users;
-- Rebuild index (defragmentation)
ALTER TABLE users ENGINE=InnoDB;
-- or
OPTIMIZE TABLE users;4. EXPLAIN Execution Plan Details
4.1 Basic Usage
EXPLAIN SELECT * FROM users WHERE age > 20 AND name LIKE '张%';4.2 Key Fields Interpretation
type field (access type, from best to worst)
system – only one row.
const – lookup by primary key or unique index.
eq_ref – join using primary/unique index.
ref – lookup using a non‑unique index.
range – index range scan.
index – full index scan.
ALL – full table scan (needs optimization).
key : the index actually used.
key_len : length of the index used (helps judge if the full index is applied).
Extra hints:
Using index – covering index is used.
Using where – rows filtered after storage engine retrieval.
Using temporary – temporary table created (needs optimization).
Using filesort – extra sorting step (needs optimization).
Using join buffer – join buffer employed.
4.3 Practical Analysis
Case 1: Composite index optimization
-- Create test table
CREATE TABLE orders (
id INT PRIMARY KEY,
user_id INT,
product_id INT,
status TINYINT,
amount DECIMAL(10,2),
order_time DATETIME,
INDEX idx_user_status (user_id, status),
INDEX idx_time (order_time)
);
-- Analyze query 1
EXPLAIN SELECT * FROM orders WHERE user_id = 100 AND status = 1;
-- Result: uses idx_user_status, type=ref
-- Analyze query 2
EXPLAIN SELECT * FROM orders WHERE user_id = 100 ORDER BY order_time DESC;
-- Result: may use idx_user_status but requires filesort
-- Optimization: add composite index covering both columns
CREATE INDEX idx_user_time ON orders(user_id, order_time);Case 2: Pagination optimization
-- Inefficient pagination with large offset
EXPLAIN SELECT * FROM orders ORDER BY id LIMIT 1000000, 20;
-- Result: scans 1,000,020 rows
-- Optimization 1: remember last ID
SELECT * FROM orders WHERE id > 1000000 ORDER BY id LIMIT 20;
-- Optimization 2: use covering index
SELECT id FROM orders ORDER BY id LIMIT 1000000, 20; -- fetch IDs first
SELECT * FROM orders WHERE id IN (list of IDs); -- then fetch rows4.4 Execution Plan Visualization Tools
-- Get detailed plan in JSON format
EXPLAIN FORMAT=JSON SELECT * FROM users WHERE age > 20;
-- View actual execution costs
SHOW STATUS LIKE 'Handler_read%';
SHOW STATUS LIKE 'Innodb_buffer_pool_read%';Key principles:
Less is more – avoid excessive indexes.
Prefer covering indexes to eliminate back‑lookups.
Follow the left‑most prefix rule for composite indexes.
Monitor index usage regularly.
Design indexes based on actual query patterns.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Senior Xiao Ying
Dedicated to sharing Java backend technical experience and original tutorials, offering career transition advice and resume editing. Recognized as a rising star in CSDN's Java backend community and ranked Top 3 in the 2022 New Star Program for Java backend.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
