Master MySQL Indexes: From Basics to B+Tree Optimization
This article explains what MySQL indexes are, how they work, their advantages and drawbacks, the different types of indexes—including primary, ordinary, composite, and full‑text—and dives deep into B‑Tree and B+Tree structures, clustering, page organization, and best practices for efficient query performance.
1. What Is an Index
Official definition: an index is a data structure that improves MySQL query efficiency by allowing fast retrieval of rows, similar to a book's table of contents.
1.1 How an Index Works
Without an index, a query like SELECT * FROM user WHERE id = 40 requires a full table scan. With an index, MySQL can perform a binary search on the index to locate the row quickly.
Index advantages:
1. Greatly speeds up data queries.
Index disadvantages:
1. Consumes database resources to maintain.
2. Takes up disk space.
3. Slows down insert, update, delete operations due to maintenance.Despite the drawbacks, the speed gain for large datasets makes indexes essential.
2. Types of Indexes
1. Primary key index – automatically created for the primary key (InnoDB uses a clustered index).
2. Ordinary index – built on regular columns without restrictions, used to accelerate queries.
3. Composite index – built on multiple columns; none of the indexed columns may be NULL.
4. Full‑text index (MySQL 5.7 and earlier, provided by MyISAM) – used for searching keywords in large text columns.3. B‑Tree and B+Tree
MySQL stores data in pages (default 16 KB). Each page contains a pointer (P) to the next page, forming a linked‑list structure similar to a linked list.
When inserting data, MySQL sorts rows by the indexed key to enable fast lookups. The page directory stores the first key of each leaf page, allowing the engine to locate the correct page with a single I/O operation.
Example: To find a row with id = 2, the page directory shows that id 2 lies between ids 1‑4 in the first page. The engine reads the first page, follows the P pointer, and finds the row with a single I/O.Storage calculations show that a 16 KB page can hold roughly 455 rows (assuming 36 bytes per row), and the page directory can reference up to 2 048 leaf pages, enabling storage of millions of rows.
Comparison of B‑Tree and B+Tree
B‑Tree stores data in both leaf and internal nodes, which can increase tree depth and I/O. B+Tree stores data only in leaf nodes, keeping the tree shallower (typically no more than three levels in enterprise environments) and reducing I/O.
InnoDB uses B+Tree for its index structures.
Clustered vs. Non‑Clustered Indexes
Clustered index : Data and index are stored together; leaf nodes contain the full row data. In InnoDB, the table itself is organized as a B+Tree based on the primary key, so each table can have only one clustered index.
Non‑clustered (secondary) index : Index stores only the indexed columns and a pointer to the primary key. Accessing data via a secondary index requires two lookups: first the secondary index to get the primary key, then the primary key to fetch the row (a “back‑table” query).
Advantages of clustered indexes:
Faster data access because the index and data are together.
Efficient range and ordered queries.
Disadvantages:
Insert speed depends on key order; out‑of‑order inserts cause page splits.
Updating primary keys is costly.
Secondary index lookups need two steps.
Secondary (Non‑Clustered) Indexes
Secondary indexes are built on top of the clustered index. Their leaf nodes store primary key values, not physical row locations. Querying via a secondary index first retrieves the primary key, then uses it to locate the row.
-- The article may not cover every detail; contributions are welcome.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Must-Have
Professional architects sharing high‑quality architecture insights. Covers high‑availability, high‑performance, high‑stability designs, big data, machine learning, Java, system, distributed and AI architectures, plus internet‑driven architectural adjustments and large‑scale practice. Open to idea‑driven, sharing architects for exchange and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
