Databases 18 min read

Understanding MySQL InnoDB Tablespaces and B+ Tree Index Structures

This article explains MySQL InnoDB's tablespace architecture—including segments, extents, pages, and rows—and details how clustered and secondary indexes are implemented using B+ trees, comparing them with other tree structures to illustrate why B+ trees are optimal for disk‑based databases.

Selected Java Interview Questions

Oct 5, 2022

In MySQL, InnoDB is the most widely used storage engine and the only free engine that supports transactions; this article uses InnoDB to introduce MySQL's index structure and why B+ trees are used for indexing.

Tablespace

All MySQL data is stored in a tablespace, which is logically divided into segments, extents, pages, and rows. The logical structure is shown in the following diagram:

Segment

Segments are the building blocks of a tablespace. Common segments include data, index, and rollback segments. Because InnoDB stores data using a B+ tree, the data segment corresponds to the leaf nodes, the index segment to the internal nodes, and the rollback segment stores undo logs for transaction rollback and version retrieval. Prior to InnoDB 1.1 only one rollback segment was supported; InnoDB 1.2 increased this to 128, allowing up to 128 * 1023 concurrent modifying transactions.

Extent

An extent consists of consecutive pages and has a fixed size of 1 MB. InnoDB allocates 4–5 extents at a time. Without compression, an extent holds 64 contiguous pages. When a new table is created, its initial size is 96 KB, using 32 fragment pages; once those are filled, the table grows in MB increments.

Page

A page is the smallest unit managed by InnoDB, defaulting to 16 KB. Since InnoDB 1.2, the page size can be changed with innodb_page_size before the instance is initialized; later changes require a dump and reload. Common page types include data, undo, system, insert‑buffer bitmap, and compressed BLOB pages.

Row

A row corresponds to a table record. Each page can store at most 16KB/2‑200 rows, i.e., 7 992 rows for a 16 KB page.

Index Structure

Clustered Index

Every InnoDB table has one clustered index that stores the row records, usually built on the primary key. Creation rules:

If a primary key is defined, InnoDB uses it for the clustered index.

If no primary key exists, InnoDB chooses a non‑null unique index.

If neither exists, InnoDB creates an implicit auto‑increment column.

Note: the chosen unique index becomes the primary key, even if it was defined later.

The clustered index is a B+ tree: internal nodes hold keys, leaf nodes hold the actual row data (data pages). Leaf pages are linked by a doubly‑linked list, allowing sequential scans.

Secondary Index

All indexes other than the clustered index are secondary indexes. Their leaf nodes store only the primary key values. To retrieve a row via a secondary index, MySQL first finds the primary key in the secondary index, then looks up the row in the clustered index (two B+ tree searches). If the secondary index contains all needed columns, the query can be satisfied without a table lookup (index‑only scan).

Secondary indexes can be single‑column or composite (multi‑column). The following diagram shows a composite index:

The B+ tree of a composite index stores sorted key tuples, e.g., (1,1),(1,2),(2,1),(2,4),(3,1),(3,2), which improves ordered queries.

When using SELECT * FROM table WHERE a=? AND b=?, the (a,b) index can be used, but the order of conditions must follow the left‑most principle; WHERE b=? cannot use the (a,b) index.

Index covering reduces I/O because the secondary index does not store full row data.

Why Use B+ Trees for Indexes?

To understand the advantage of B+ trees, we compare them with other tree structures.

Binary Search Tree (BST): Unbalanced

A BST provides O(log n) search on average, but can become unbalanced and degrade to O(n) in the worst case.

Balanced Binary Tree (AVL): Rotation Overhead

AVL trees maintain strict balance with rotations, giving O(log n) operations, but deletions may require O(log n) rotations, making them costly for write‑heavy workloads.

Red‑Black Tree: Height Too High for Disk

Red‑black trees relax balance, allowing height up to twice the minimal height, which reduces rotation cost to O(1). They are widely used in memory structures (e.g., Java TreeMap, HashMap) but are unsuitable for disk‑based storage because their height still leads to many I/O operations.

B‑Tree: Designed for Disk

B‑trees are multi‑way balanced trees; each internal node can have many children, resulting in a much lower height than binary trees and thus fewer disk I/Os.

B+‑Tree: Further Optimisation

Differences from B‑tree:

Only leaf nodes store actual data; internal nodes store keys only.

Keys may be duplicated in leaf and internal nodes.

Leaf nodes are linked via a doubly‑linked list.

Non‑leaf nodes store the same number of keys as children.

Advantages:

Fewer I/Os: More keys per node lower the tree height.

Better range queries: Leaf linked list enables sequential scans.

Stable query cost: All data resides in leaf level, so search cost equals tree height.

The main drawback is increased space due to key duplication, which is acceptable given the performance gains.

Estimating B+‑Tree Height in InnoDB

Node capacity depends on page size (default 16 KB) and record size. Assuming a non‑leaf page holds ~1 000 index entries (≈16 bytes each) and a leaf page holds ~100 entries, a three‑level B+ tree can store about 1 × 10⁹ rows (1 000 × 1 000 × 100). By contrast, a binary tree would need roughly 26 levels for the same number of rows.

Summary

Binary Search Tree: Simple ordering but can become unbalanced.

Balanced Binary Tree (AVL): Guarantees balance at the cost of expensive rotations.

Red‑Black Tree: Balances rotation cost and height, widely used in memory, but still too tall for disk.

B‑Tree: Multi‑way structure reduces height for disk‑based storage.

B+‑Tree: Leaf‑only data storage and linked leaves give fewer I/Os, efficient range scans, and stable performance.

Source: guobinhit.blog.csdn.net/article/details/106221933

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database Storage Engine InnoDB Indexes B+Tree

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.