Understanding MySQL Indexes: B+ Tree Principles and Optimization
This article explains why MySQL uses B+ trees for indexing, describes the underlying principles of various index types, compares MyISAM and InnoDB implementations, and provides practical optimization guidelines such as using auto‑increment primary keys, left‑most prefix rules, and configuration tuning.
Preface
There are many kinds of indexes—hash, B‑tree, B+‑tree, full‑text, etc. MySQL supports multiple storage engines, each with different index capabilities. This article explores why MySQL adopts B+ trees for indexes, the underlying principles of indexes, and how to optimise them in SQL.
Principles of Indexes
Purpose of Indexes
Indexes aim to improve query or retrieval efficiency. For example, searching for the word "mysql" in a dictionary is similar to traversing a tree from the root to a leaf, which is far faster than scanning a linked list sequentially.
Why MySQL Uses B+ Trees
Ordinary binary trees become too tall for large data sets, leading to poor performance. MySQL therefore prefers B+ trees over B‑trees because B+ trees reduce disk I/O and provide more balanced search paths.
B‑Tree Introduction
Binary‑tree search has O(log N) complexity, but disk I/O is the real bottleneck. When data cannot fit entirely in memory, each disk page corresponds to a tree node; a deep tree means many I/O operations. Multi‑way search trees (B‑tree/B+‑tree) were created to minimise this cost.
Each node stores keys and pointers, interleaved.
Keys within a node are sorted in ascending order.
Leftmost pointer points to keys smaller than the first key; intermediate pointers point to ranges between adjacent keys.
Node size is uniform; each non‑leaf node contains n‑1 keys and n pointers, where d ≤ n ≤ 2d.
B+ Tree
Internal nodes store only keys and pointers; leaf nodes store keys and data.
Internal and leaf nodes have different sizes because they store different information.
Each non‑leaf node can have up to 2d pointers (instead of 2d + 1).
Because leaf nodes contain no data, more space is available for keys, resulting in a higher fan‑out and shallower trees, which improves search efficiency.
Why B+ Trees Suit MySQL Indexes
Lower disk‑read cost: non‑leaf nodes contain only keys, allowing more keys per page and fewer I/O operations.
Stable query performance: every search follows the same root‑to‑leaf path, unlike B‑trees where data in internal nodes can cause varying paths.
Facilitates full‑table scans: all data resides in leaf nodes, so scanning the leaf level yields the entire dataset without traversing internal nodes.
MySQL Index Implementation
MyISAM Index Implementation
MyISAM uses B+ trees; leaf nodes store the address of the data record.
The primary index (e.g., on column Col1) stores record addresses in the leaf data field. A secondary index (e.g., on column Col2) has a similar B+‑tree structure but points to the primary key values.
InnoDB Index Implementation
InnoDB differs in two ways:
The data file itself is a B+‑tree and serves as the clustered primary index; leaf nodes contain the full row data.
Secondary indexes store the primary‑key value in their leaf nodes instead of the record address.
Index Optimization
Strong Recommendation: Use Auto‑Increment Primary Keys
When using InnoDB, an auto‑increment integer primary key keeps inserts sequential, resulting in compact leaf pages and minimal page splits, which greatly improves write performance.
Conversely, random non‑sequential keys (e.g., UUIDs or IDs based on strings) cause frequent page splits and data movement, degrading performance.
Leftmost Prefix Principle
For a composite index (A, B, C), MySQL can use the index only for queries that reference the leftmost columns in order (A; A + B; A + B + C). Skipping a column (e.g., querying B alone) prevents the index from being used.
Other Indexing Guidelines
Prefer high‑cardinality columns (distinct/total ratio close to 1).
Avoid functions on indexed columns; they invalidate the index.
Composite indexes are more cost‑effective than multiple single‑column indexes.
Index columns that are frequently filtered, joined, sorted, or grouped.
Limit the total number of indexes to avoid overhead on INSERT/UPDATE.
Use the smallest possible data type for indexed columns.
Remove unused or rarely used indexes.
MySQL Performance Tuning
Common Causes of Slow Queries
Hardware bottlenecks (network, memory, I/O, disk space).
Missing or ineffective indexes.
Excessive data volume (consider sharding).
Improper server or configuration settings.
Analyzing and Solving Slow Queries
Enable the slow‑query log and set an appropriate threshold.
Use EXPLAIN to examine execution plans and identify missing indexes.
Use SHOW PROFILE for detailed per‑statement timing.
Consult DBAs or ops teams for server‑level tuning.
Configuration Optimization
Basic Settings
innodb_buffer_pool_size : size of the InnoDB data and index cache; larger is better.
innodb_log_file_size : size of the redo log; larger values improve write throughput.
max_connections : increase cautiously; too high can cause resource exhaustion.
InnoDB‑Specific Settings
innodb_file_per_table : store each table in its own .ibd file for easier space reclamation.
innodb_flush_log_at_trx_commit : 1 for full ACID safety, 2 for better performance on replicas.
innodb_flush_method : O_DIRECT for RAID with battery‑backed cache; otherwise fdatasync.
innodb_log_buffer_size : increase if large BLOBs or TEXT are frequently written.
Explain Execution Plan
Prepare sample data and run EXPLAIN to view columns such as id, select_type, table, type, possible_keys, key, key_len, ref, rows, and extra.
CREATE TABLE `user_info` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(50) NOT NULL DEFAULT '',
`age` INT(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_index` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO user_info (name, age) VALUES ('xys', 20);
... (additional INSERT statements) ...
CREATE TABLE `order_info` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`user_id` BIGINT(20) DEFAULT NULL,
`product_name` VARCHAR(50) NOT NULL DEFAULT '',
`productor` VARCHAR(30) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_product_detail_index` (`user_id`,`product_name`,`productor`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
... (additional INSERT statements) ...EXPLAIN output shows how MySQL chooses indexes, join order, and whether it uses filesort, temporary tables, or index‑only scans.
Summary
Key indexing principles include choosing unique or high‑cardinality indexes, indexing columns used for sorting/grouping/joining, limiting the number of indexes, preferring the leftmost prefix rule, keeping indexed columns free of functions, and extending existing composite indexes instead of creating new ones.
·END·
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.