Understanding InnoDB Logical Storage Structure and B+Tree Indexes in MySQL
The article explains MySQL InnoDB's logical storage architecture, including tablespaces, pages, segments, and B+‑tree indexes, demonstrates how primary and secondary indexes are organized and accessed, and shows how to calculate index tree height and data capacity using SQL queries and hexdump analysis.
When querying a table, a common misconception is that fewer rows always mean faster queries; the reality depends on MySQL's B+‑tree index structure.
InnoDB Logical Storage Structure
All data in InnoDB is stored in a single tablespace by default, though innodb_file_per_table can enable a separate tablespace per table.
mysql> show variables like 'innodb_file_per_table';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| innodb_file_per_table | ON |
+---------------------------+-------+The tablespace consists of segments (data, index, rollback, etc.). Each segment is divided into 1 MiB extents, and each extent contains 64 pages of the default 16 KiB size.
Pages are the smallest storage unit; the page size can be changed with innodb_page_size (e.g., 4 KiB, 8 KiB).
mysql> show variables like 'innodb_page_size';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| innodb_page_size | 16384 |
+-------------------+-------+B+ Tree
B+‑tree indexes store data in leaf nodes, which are linked via a doubly‑linked list. The leaf nodes contain the full row for clustered (primary) indexes.
Clustered (Primary) Index Example
Given a user table, the primary key id forms a B+‑tree where the tree height is 2, leaf nodes store id, name, and age, and the lookup process involves two I/O operations. SELECT * FROM user WHERE id = 30; Determine the range in the first level (e.g., 25‑50).
Follow the pointer to the second‑level leaf page.
Load the leaf page into memory and locate the row via binary search.
Summary: two I/O reads, memory lookup time negligible.
Secondary (Non‑Clustered) Index Example
The name index stores name and the primary key id in leaf nodes. A query like SELECT * FROM user WHERE name = 'jack' first uses the secondary index to find the id, then performs a primary‑key lookup. SELECT * FROM user WHERE name = 'jack'; Locate the range in the first level.
Follow pointers to the leaf page.
Find the name entry and retrieve the associated id.
Use the primary key to fetch the full row.
Summary: four I/O reads (two for the secondary index, two for the primary‑key lookup).
Estimating Data Capacity of an Index Tree
Using show table status like 'user'\G we see an average row length of 45 bytes. With a 16 KiB page, a leaf page can hold about 364 rows.
mysql> show table status like 'user'\G
... Avg_row_length: 45
... Data_length: 458752
...Each pointer in the root page occupies 10 bytes (4 bytes for the integer key + 6 bytes for the InnoDB pointer), allowing roughly 1 638 pointers per root page.
Therefore:
Tree height 2 → capacity ≈ 1 638 × 364 ≈ 596 232 rows.
Tree height 3 → capacity ≈ 1 638 × 1 638 × 364 ≈ 976 628 016 rows.
Determining Index Tree Height
Each InnoDB page stores a PAGE_LEVEL field; leaf pages have level 0. The root page’s level plus one equals the tree height.
SELECT b.name, a.name, index_id, type, a.space, a.PAGE_NO
FROM information_schema.INNODB_SYS_INDEXES a,
information_schema.INNODB_SYS_TABLES b
WHERE a.table_id = b.table_id
AND a.space <> 0
AND b.name='test/user';
-- Result shows PAGE_NO for primary index = 3Reading the corresponding user.ibd file at offset 16384*3 + 64 = 49216 reveals the PAGE_LEVEL value (e.g., 0x01), confirming a tree height of 2.
Conclusion
The query performance is more closely tied to the index tree height than to the total number of rows; tables with the same tree height exhibit similar query times regardless of row count.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
