Understanding MySQL Indexes: B+Tree Structure, Implementation, and Optimization
This article explains why MySQL uses B+‑tree indexes, describes the principles of B‑tree and B+‑tree structures, compares MyISAM and InnoDB index implementations, and provides practical optimization tips such as using auto‑increment primary keys, the left‑most prefix rule, and proper configuration settings.
Introduction
Indexes are data structures that help MySQL retrieve rows efficiently. MySQL supports many index types (hash, B‑tree, B+‑tree, full‑text, etc.) and multiple storage engines, each with its own index capabilities. This article explores why MySQL chooses B+‑trees, the underlying principles of indexes, and how to optimise them.
Index Principles
The purpose of an index is to improve query speed by reducing the amount of data that must be scanned, similar to looking up a word in a dictionary by narrowing the search range step by step.
Why MySQL Uses B+‑Tree
Ordinary binary trees have high height for large data sets, leading to poor performance. B‑trees improve on this but still store data in internal nodes, causing more disk I/O. B+‑trees store only keys in internal nodes and all data in leaf nodes, which reduces I/O and improves search stability.
B‑Tree Overview
B‑trees are multi‑way search trees designed to minimise disk I/O. Each node contains a range of keys and pointers, with the number of keys per node bounded by a degree d (d ≤ n ≤ 2d). The height is much lower than that of a binary tree, making them suitable for storage on disk.
B+‑Tree Characteristics
Internal nodes store only keys and pointers; leaf nodes store keys and the actual row data.
Leaf nodes are linked, enabling efficient range scans.
Because leaf nodes contain no data, internal nodes can hold more keys, reducing tree depth and I/O.
Why B+‑Tree Fits MySQL Indexes
Lower disk‑read cost: more keys per page mean fewer page reads.
Stable query path: every search follows the same root‑to‑leaf path.
Easy full‑table scans: leaf nodes contain all data, so a sequential scan of leaves retrieves rows in order.
MySQL Index Implementations
MyISAM
MyISAM uses B+‑tree indexes where the leaf data field stores the physical address of the row. This is a non‑clustered index; the table data is stored separately.
InnoDB
InnoDB stores the table data itself in a clustered B+‑tree indexed by the primary key. Leaf nodes contain the full row, and secondary indexes store the primary‑key value in their data field.
Index Optimisation
Prefer Auto‑Increment Primary Keys
Using an auto‑increment integer as the primary key keeps inserts at the end of the leaf page, avoiding page splits and reducing write amplification.
Left‑Most Prefix Rule for Composite Indexes
For a composite index (A, B, C), queries can use the index only if they filter on A, or A + B, or A + B + C. Skipping the leftmost column makes the index unusable.
Other Best Practices
Choose columns with high selectivity (distinct/total ratio close to 1).
Avoid functions on indexed columns; they prevent index usage.
Prefer composite indexes over many single‑column indexes.
Index columns frequently used in WHERE, JOIN, ORDER BY, or GROUP BY clauses.
MySQL Performance Tuning
Common Causes of Slow Queries
Hardware bottlenecks (network, memory, I/O).
Missing or ineffective indexes.
Excessive data volume (need sharding).
Improper server or configuration settings.
Diagnosing Slow SQL
Enable the slow‑query log and set an appropriate threshold.
Use EXPLAIN to view the execution plan.
Use SHOW PROFILE for detailed timing.
Adjust server parameters with DBA assistance.
Configuration Recommendations
innodb_buffer_pool_size : allocate as much memory as possible for caching data and indexes.
innodb_log_file_size : larger redo logs improve write throughput.
max_connections : increase cautiously; too many active connections can degrade performance.
innodb_file_per_table : store each table in its own .ibd file for easier space reclamation.
innodb_flush_log_at_trx_commit : set to 2 for better performance when durability can be relaxed.
innodb_flush_method : use O_DIRECT with hardware RAID, otherwise fdatasync .
Example DDL and Data
CREATE TABLE `user_info` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(50) NOT NULL DEFAULT '',
`age` INT(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_index` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO user_info (name, age) VALUES ('xys',20),('a',21),('b',23),('c',50),('d',15),('e',20),('f',21),('g',23),('h',50),('i',15);
CREATE TABLE `order_info` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`user_id` BIGINT(20) DEFAULT NULL,
`product_name` VARCHAR(50) NOT NULL DEFAULT '',
`productor` VARCHAR(30) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_product_detail_index` (`user_id`,`product_name`,`productor`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO order_info (user_id, product_name, productor) VALUES
(1,'p1','WHH'),(1,'p2','WL'),(1,'p1','DX'),(2,'p1','WHH'),(2,'p5','WL'),(3,'p3','MA'),(4,'p1','WHH'),(6,'p1','WHH'),(9,'p8','TE');Understanding EXPLAIN Output
id : execution order of rows.
select_type : type of SELECT (SIMPLE, PRIMARY, SUBQUERY, etc.).
table : table or derived table involved.
type : access method (system, const, eq_ref, ref, range, index, ALL).
possible_keys : indexes that could be used.
key : index actually used.
key_len : length of the used index in bytes.
ref : column or constant compared to the index.
rows : estimated rows examined.
extra : additional info (using filesort, using index, using temporary, using where).
Conclusion
Effective indexing in MySQL relies on understanding B+‑tree mechanics, choosing appropriate primary and secondary keys, following the left‑most prefix rule, and configuring InnoDB parameters to match workload characteristics. Proper index design dramatically reduces I/O, speeds up queries, and improves overall database performance.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.