Master MySQL Indexes: B+Tree Secrets and High‑Performance Query Strategies
This comprehensive guide explains MySQL indexing fundamentals, B+Tree advantages, index classifications, optimal creation strategies, and how to use EXPLAIN to diagnose and fine‑tune queries for maximum performance while avoiding common pitfalls.
Introduction
Indexes are critical for performance, especially as table size grows. Improper indexes may not affect small datasets but cause severe slowdown at scale.
Developers often add indexes to improve performance, but poorly designed indexes can cause serious production issues.
Indexes can boost query performance by orders of magnitude, but creating a truly optimal index often requires rewriting the query, so good query writing is equally important.
MySQL Logical Architecture
MySQL follows a client‑server architecture. The service layer handles query parsing, analysis, optimization, caching, and built‑in functions (date, time, math, encryption). All storage‑engine‑independent features such as stored procedures, triggers, and views are implemented here.
The storage‑engine layer provides different engines, each with its own strengths. The service layer accesses engines via a uniform API, hiding engine differences from the upper query processing.
B+ Tree
B+ Tree vs B Tree
Only leaf nodes store actual data (index + record); internal nodes store only indexes.
All indexes appear in leaf nodes, which are linked in an ordered list.
Non‑leaf nodes also contain indexes that are the maximum (or minimum) of their child nodes.
The number of child nodes under a non‑leaf node equals the number of indexes it holds.
Why B+ Tree Suits MySQL Better
1. More Stable Query Efficiency Because B+ trees store only indexes in internal nodes, they are shorter and require fewer disk I/O operations than B trees, which store records in internal nodes.
2. Faster Insert/Delete B+ trees have many redundant leaf nodes, allowing deletions and insertions without modifying internal nodes, whereas B trees may need complex rebalancing.
3. Efficient Range Queries Leaf nodes are linked, enabling sequential traversal for range scans, unlike B trees that must traverse the tree for each range step.
MySQL’s Use of B+ Tree
In InnoDB, B+ trees have special characteristics:
Leaf nodes are connected by a double‑linked list, allowing bidirectional traversal.
Each leaf node is a 16 KB data page that stores user records and metadata.
InnoDB distinguishes between clustered (primary) indexes and secondary indexes. Clustered index leaf nodes contain the full row data, while secondary index leaf nodes store only the primary key values.
Index Classification
Indexes can be classified from four perspectives:
By data structure: B+ tree, hash, full‑text.
By physical storage: clustered (primary) index, secondary (auxiliary) index.
By column characteristics: primary, unique, normal, prefix.
By column count: single‑column, multi‑column.
Below we focus on physical storage: clustered and secondary indexes.
Clustered Index : Leaf nodes store complete rows; a table can have only one clustered index.
Secondary Index : Leaf nodes store only primary key values; queries must first locate the primary key then fetch the row.
Covering Index : If a secondary index contains all columns needed by a query, the engine can satisfy the query using only the index, avoiding a table lookup. Example: select * disables covering index and forces a back‑table lookup.
Example
explain select store_id, create_time from item where store_id > 332604504321036698shows a full table scan; rewriting to
explain select store_id from item where store_id > 332604504321036698uses the index for a range scan, reducing I/O.
High‑Performance Index Strategies
Correct index creation and usage are the foundation of high‑performance queries.
Independent Columns
Index columns must not be part of expressions or function arguments; otherwise the index cannot be used.
Example of a non‑usable index:
SELECT actor_id FROM sakila.actor WHERE actor_id + 1 = 5;Prefix Index
Prefix indexes store only the first N characters of a string column, reducing index size and speeding up lookups. They are suitable for long strings with low prefix duplication. However, MySQL cannot use prefix indexes for ORDER BY, GROUP BY, or covering indexes.
The key parameter is prefix_length, which should be chosen based on index selectivity (distinct values / total rows). A selectivity above 0.1 is generally desirable.
SELECT 1.0 * COUNT(DISTINCT column_name) / COUNT(*) FROM table_name; SELECT 1.0 * COUNT(DISTINCT LEFT(FirstName,5)) / COUNT(*) FROM Employee; ALTER TABLE test.Employee ADD KEY(FirstName(5));Multi‑Column Index
Common mistake: creating separate indexes for each column or building a multi‑column index in the wrong column order. Instead, create a composite index that matches the query’s column order.
Example: a query filtering on actor_id and film_id benefits from a composite index on (actor_id, film_id).
SELECT film_id, actor_id FROM sakila.film_actor WHERE actor_id = 1 AND film_id = 1;Index Column Order
Place the most selective columns first to improve query efficiency. Example: if customer_id has higher selectivity than staff_id, put customer_id first in the composite index.
SELECT COUNT(DISTINCT staff_id)/COUNT(*) AS staff_id_selectivity, COUNT(DISTINCT customer_id)/COUNT(*) AS customer_id_selectivity FROM payment;Clustered Index
A clustered index is a storage method, not a separate index type. Each table can have only one clustered index, which stores the entire row data.
Covering Index
If an index contains all columns required by a query, it is a covering index, eliminating the need for a back‑table lookup and reducing tree traversals.
When Using index appears in the Extra column of EXPLAIN, the query is using a covering index.
Beware that select * on a covering index disables it because additional columns are requested, forcing a back‑table lookup.
SELECT * FROM products JOIN (SELECT product_id FROM products WHERE actor = 'SEAN CARREY' AND title LIKE '%APOLLO%') AS t1 ON t1.product_id = products.product_id;Other Tips
Left‑most Prefix Rule : MySQL matches index columns from left to right until it encounters a range condition (<, >, BETWEEN, LIKE). Columns after the range are ignored.
= and IN are order‑agnostic : MySQL can reorder equality conditions to match the index.
Choose High‑Selectivity Columns : Use columns with a high distinct‑to‑total ratio (e.g., >0.1). Low‑selectivity columns like gender may cause the optimizer to skip the index.
Extend Existing Indexes : Instead of creating a new index, modify an existing one to include additional columns.
When to Create Indexes
Indexes improve query speed but have drawbacks: they consume storage, increase write overhead, and can slow down inserts/updates due to tree rebalancing.
Do not index columns used only in WHERE/GROUP BY/ORDER BY that are not selective.
Do not index columns with many duplicate values (e.g., gender) because the optimizer may prefer a full scan.
Do not index very small tables.
Avoid indexing frequently updated columns, as maintaining the index adds overhead.
When Indexes Are Beneficial
Columns with uniqueness constraints (e.g., product codes).
Columns frequently used in WHERE clauses; consider composite indexes for multi‑column filters.
Columns used in GROUP BY or ORDER BY, as the index already stores rows in sorted order.
Query Performance Optimization
Even with optimal schema and indexes, poorly written queries can cripple performance.
Optimizing Data Access
Avoid fetching unnecessary data (e.g., SELECT * when only a few columns are needed).
Use EXPLAIN to check if the optimizer scans extra rows.
Refactoring Queries
Consider splitting a complex query into several simpler ones when appropriate.
Break large queries into smaller batches to reduce lock contention and transaction log pressure.
Decompose joins by executing single‑table queries first and merging results in the application.
Benefits of query decomposition include higher cache efficiency, reduced lock contention, easier sharding, potential speed gains, and fewer redundant record reads.
Query Execution Process
Client sends the query to the server.
Server checks the query cache; if hit, returns cached result.
SQL is parsed and optimized, producing an execution plan.
MySQL invokes the storage‑engine API to execute the plan.
Result is returned to the client.
Optimizing Specific Query Types
COUNT Optimization : Use covering indexes; otherwise consider summary tables or external caches like Redis.
Join Optimization : Ensure indexed columns are used in ON/USING clauses; keep GROUP BY and ORDER BY expressions limited to a single table.
Subquery Optimization : Replace subqueries with joins when possible (MySQL 5.6+ can handle many subqueries efficiently).
GROUP BY / DISTINCT : Use indexes to avoid temporary tables or filesort.
LIMIT Pagination : Prefer index‑based scans; for large offsets, use ID‑based pagination or delayed joins.
UNION Optimization : Use UNION ALL unless duplicate elimination is required; otherwise MySQL creates costly temporary tables.
MySQL EXPLAIN Details
After applying indexing and query‑tuning techniques, EXPLAIN helps diagnose remaining performance problems.
Field Meaning
id : Execution order of SELECT statements.
select_type : Type of SELECT (SIMPLE, PRIMARY, SUBQUERY, UNION, DERIVED, etc.).
table : Table being accessed.
type : Access method, ordered from best to worst (NULL, system, const, eq_ref, ref, ref_or_null, index_merge, range, index, ALL).
key : Index actually used.
rows : Estimated rows examined.
Extra : Additional information such as Using filesort, Using temporary, Using index, Using where, etc.
Common Extra Values
Using filesort: MySQL must sort results externally because no suitable index exists. Using temporary: MySQL creates a temporary table for ORDER BY or GROUP BY. Using index: Query is satisfied by a covering index. Using where: WHERE clause filters rows. Using join buffer: Large joins use a join buffer; increase its size if needed. impossible where: WHERE clause cannot match any rows. distinct: Optimizes DISTINCT by stopping after the first matching row.
Conclusion
This article has covered the principles, classifications, and practical usage of MySQL indexes, combined with query optimization techniques and detailed EXPLAIN analysis to write high‑performance SQL statements.
Eric Tech Circle
Backend team lead & architect with 10+ years experience, full‑stack engineer, sharing insights and solo development practice.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
