Unlock MySQL Performance: Deep Dive into B+Tree Indexes and Optimization
This article explains the fundamentals of MySQL indexes, the B+Tree data structure, differences between MyISAM and InnoDB implementations, practical tips for creating effective composite indexes, and essential server configuration and SQL tuning techniques to dramatically improve query performance.
MySQL Indexes
MySQL supports multiple storage engines, each offering different index types such as B‑Tree, hash, and full‑text. This article focuses on B‑Tree indexes, the most commonly used structure in everyday MySQL work.
Index Purpose and Principle
An index is a data structure that speeds up data retrieval, similar to a dictionary that narrows the search range before scanning the entire dataset.
By repeatedly reducing the search space, B‑Tree indexes transform random I/O into sequential I/O, minimizing disk reads.
B+Tree Structure
A B+Tree consists of disk pages (blocks) that contain multiple data entries (blue) and pointers (yellow). Leaf nodes store the actual row data, while internal nodes store only key values to guide the search.
Example: Disk block 1 holds entries 17 and 35 with pointers P1 (values < 17), P2 (17‑35), and P3 ( > 35). The real rows reside in leaf nodes such as 3, 5, 9, …, 99.
Searching for the value 29 requires loading three pages (three I/Os) compared to a million I/Os without an index.
B+Tree Properties
Smaller key size reduces tree height, so keep indexed columns as small as possible (e.g., INT vs BIGINT).
Composite indexes are ordered left‑to‑right; the leftmost prefix must be used for the index to be effective.
Only equality or range conditions on the leftmost columns can utilize the index.
MyISAM Index Implementation
MyISAM uses a B+Tree where leaf nodes store the address of the data record. The primary key and secondary keys share the same structure; the primary key must be unique, while secondary keys may contain duplicates.
InnoDB Index Implementation
InnoDB also uses B+Tree structures but stores the full row data in the leaf nodes of the primary (clustered) index. Secondary indexes store the primary key value instead of the row address.
Because InnoDB’s primary key is the data file itself, using a wide or non‑monotonic primary key (e.g., UUID) can cause frequent page splits and performance degradation. An auto‑increment integer is usually the best choice.
How to Build Effective Indexes
Left‑most Prefix Principle
Composite indexes are ordered tuples; only the leftmost contiguous columns can be used. For example, a query on columns (A, B, C) can use the index for (A), (A, B), or (A, B, C) but not for (B) alone.
If a query uses (A, C) the C part is ignored because B is missing.
Range conditions stop the index usage for subsequent columns.
Common Indexing Tips
Apply the left‑most prefix rule; stop at the first range condition.
Equality and IN conditions can be reordered; MySQL will reorder them to match the index.
Choose columns with high selectivity (distinct values / total rows) for indexes.
Avoid functions on indexed columns (e.g., FROM_UNIXTIME(create_time)) because they prevent index usage.
Prefer extending existing indexes over creating new ones when possible.
MySQL Configuration Optimization
Key Server Variables
innodb_buffer_pool_size: Size of the InnoDB buffer pool; larger values keep more data/indexes in memory. innodb_log_file_size: Size of the redo log; larger values improve write performance but affect crash recovery time. max_connections: Increase if you see “Too many connections” errors, but beware of resource exhaustion. innodb_file_per_table: Store each table's data and indexes in its own .ibd file (default ON in MySQL 5.6+). innodb_flush_log_at_trx_commit: Set to 2 for faster writes when durability can be relaxed. innodb_flush_method: Use O_DIRECT on systems with RAID controllers that have battery‑backed cache. query_cache_size: Usually set to 0 because the query cache can become a bottleneck. log_bin: Enable binary logging for replication or point‑in‑time recovery.
SQL Tuning Workflow
Confirm the query is slow (use SQL_NO_CACHE to avoid cache effects).
Identify the table with the smallest result set and start filtering there.
Run EXPLAIN to view the execution plan and verify index usage.
Rewrite queries to use index‑friendly patterns (e.g., avoid != on indexed columns).
Consider adding or adjusting indexes based on the most frequent and costly queries.
Iterate: observe results, refine, and repeat.
mysql> explain select * from servers;
+----+-------------+---------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | servers | ALL | NULL | NULL | NULL | NULL | 1 |
+----+-------------+---------+------+---------------+------+---------+------+------+-------+
1 row in set (0.03 sec)Practical Example
Given a table circlemessage_idx_0 with a primary key (msg_id, to_id) and a secondary index (from_id, circle_id), the following query shows poor index usage because the condition msg_id >= ... forces a range scan on the primary key, ignoring the more selective to_id column.
mysql> explain select msg_id from circlemessage_idx_0
where to_id = 113487 and circle_id = 10019063
and msg_id >= 6273803462253938690
and from_id != 113487
order by msg_id asc limit 30;
+----+-------------+---------------------+-------+-------------------------+---------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | rows | Extra |
+----+-------------+---------------------+-------+-------------------------+---------+---------+------+--------+-------------+
| 1 | SIMPLE | circlemessage_idx_0 | range | PRIMARY,idx_from_circle| PRIMARY | 16 | 349780| Using where |
+----+-------------+---------------------+-------+-------------------------+---------+---------+------+--------+-------------+
1 row in set (0.00 sec)Analysis shows that only the primary key is used, scanning ~350 k rows. To improve, create a composite index starting with to_id, e.g., (to_id, circle_id, msg_id), so the query can filter by to_id first and then apply the range on msg_id.
Summary
Understanding the underlying B+Tree structure and the left‑most prefix rule is essential for designing efficient MySQL indexes. Use narrow, high‑selectivity columns as the leading parts of composite indexes, prefer auto‑increment primary keys for InnoDB, and balance index count against write overhead.
Indexes are not always beneficial; add them judiciously.
Distinguish between primary (clustered) keys and secondary indexes.
Grasp index structure and query‑index interaction.
Apply configuration tweaks and SQL‑level tuning together for optimal performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
