Understanding MySQL Index Structures, B+ Trees, and Practical Optimization Techniques
This article explains why MySQL uses B+‑tree indexes, describes the left‑most prefix rule for composite indexes, offers practical index‑design and optimization tips, outlines MyBatis first‑ and second‑level caching, details master‑slave replication, and introduces common sharding strategies and their implementation considerations.
What data structure does MySQL use for indexes? Why B+ trees?
MySQL uses B+ trees for its indexes. The article starts by prompting readers to think about index design using the Why‑What‑How framework.
It explains the need for indexes, what they are, and how they work, then introduces B+ trees and their advantages over AVL trees and plain B trees, such as lower height (typically 4‑5 for millions of rows) and dense leaf‑node ordering for efficient range queries.
Reasons for choosing B+ trees include reduced disk I/O due to shallow height and the ability to perform sequential leaf‑node scans for range queries.
If AVL trees were used, the height would be too high, causing many disk I/O operations.
B+ trees store all keys in leaf nodes linked in order, enabling fast range scans without traversing internal nodes.
What is the left‑most prefix matching principle?
When creating a composite index, MySQL follows the left‑most prefix rule: the index can be used only from the leftmost column onward.
For example, an index on (major, class) stores values as a B+ tree where the leftmost column (major) is sorted first, then class.
If a query filters only by class, the composite index cannot be used; filtering by major can use the index, though actual execution may vary with data size.
How to design indexes and perform optimization?
1. Use covering indexes to avoid back‑table lookups.
2. Create unique indexes for fields with unique business meaning, even on composite columns.
3. Limit joins to three tables and ensure join columns are indexed and of identical types.
4. For VARCHAR columns, specify an appropriate prefix length based on selectivity (e.g., 20 characters for >90% selectivity).
5. Avoid left‑most or full wildcard searches; use a search engine if needed.
6. Aim for at least range type in the EXPLAIN output, preferably ref or const. const: single row match via primary or unique index. ref: ordinary index lookup. range: index range scan; index type is slower than range.
7. Place the most selective column on the left side of a composite index; for mixed equality and range conditions, put equality columns first.
8. Ensure column types match to avoid implicit conversion that invalidates indexes.
MyBatis cache overview
First‑level cache : scoped to a SqlSession, enabled by default; cleared on any insert, update, delete, or commit.
Second‑level cache : scoped to a mapper, disabled by default; stores query results across SqlSessions when enabled.
MySQL master‑slave replication
Master writes changes to the binary log.
Slave I/O thread fetches the binary log from the master.
Slave writes the received log to its relay log.
Slave SQL thread reads the relay log and applies the changes.
Sharding (database/table partitioning) design
When a table exceeds millions of rows or 2 GB, consider horizontal or vertical sharding.
Common middleware includes Sharding‑Sphere (formerly Sharding‑JDBC), TDDL, and Mycat.
Typical strategies:
Hash‑based: userId % 64 → 64 tables.
Fixed‑digit: use specific digits of userId to map to 100 tables.
Range‑based: split by numeric ranges (e.g., 0‑10 M per table).
Vertical splitting: separate business domains into different databases (orders, users, marketing).
Separate large or rarely used columns into auxiliary tables.
Querying by phone number when sharding is based on userId
Create an auxiliary phone‑number‑to‑userId index table; first look up the userId by phone number, then retrieve the user record from the appropriate shard.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
