Databases 31 min read

Unlock MySQL Index Power: B+Tree Basics, Engine Differences, and Tuning Tips

This article explains MySQL index fundamentals, the B+Tree structure, differences between MyISAM and InnoDB implementations, practical indexing strategies, and performance‑tuning techniques such as configuration parameters, EXPLAIN analysis, and optimal index design for real‑world queries.

Java Interview Crash Guide

Jun 18, 2021

MySQL Index

MySQL supports many storage engines, each with its own index support, so MySQL provides various index types such as B‑Tree, hash, and full‑text indexes. To avoid confusion, this article focuses only on B‑Tree indexes, which are the most commonly used in MySQL.

Index definition : An index is a data structure that helps MySQL retrieve data efficiently. In short, an index is a data structure.

MySQL Index Principles

Purpose of an index

An index improves query efficiency, similar to a dictionary: to find the word "mysql" you first locate the letter "m", then "y", and so on. Without an index, you would have to scan every entry.

In a library, you first locate the classification and then the book number—another real‑world example of how an index speeds up lookup.

Index principle

All indexes work by repeatedly narrowing the search range, turning random access into sequential steps. Databases must handle equality, range (>, <, BETWEEN), fuzzy (LIKE), union (OR), and multi‑value (IN) queries, requiring suitable index structures.

Dividing data into segments (e.g., 1‑100, 101‑200) allows the system to skip large portions of irrelevant data, similar to how a B‑Tree reduces the search space to O(log N).

B+Tree Index Structure

Because we need a data structure that minimizes disk I/O (ideally constant‑time), a multi‑way search tree—specifically a B+Tree—was created.

B+Tree structure explanation

Each light‑blue block represents a disk page containing several data items (dark blue) and pointers (yellow). For example, page 1 holds items 17 and 35 with pointers P1 (less than 17), P2 (between 17 and 35), and P3 (greater than 35). Real data resides in leaf nodes (e.g., 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99). Non‑leaf nodes store only guide keys, not actual table rows.

B+Tree search process

To find item 29, MySQL loads page 1 (first I/O), performs a binary search in memory, follows pointer P2 to page 3 (second I/O), finds 29 between 26 and 30, follows P2 to page 8 (third I/O), and finally locates the row. A three‑level B+Tree can locate a row with just three I/Os, a massive performance gain compared to scanning the entire table.

If no index is present, each row requires an I/O, leading to millions of disk accesses for large tables.

B+Tree Properties

1. Smaller index keys (e.g., INT 4 bytes vs BIGINT 8 bytes) reduce tree height and improve performance.

2. B+Tree stores full rows only in leaf nodes; placing rows in inner nodes would increase page size and height, degrading performance.

3. For composite keys (e.g., (name, age, sex)), the tree compares columns left‑to‑right. If the leftmost column is missing, the index cannot be used effectively—this is the left‑most matching property.

MySQL Index Implementation

MyISAM Index Implementation

MyISAM uses a B+Tree where leaf nodes store the address of the data record. The index file contains only pointers to rows.

Primary and secondary indexes have the same structure; the primary key must be unique, while secondary keys may contain duplicates.

InnoDB Index Implementation

InnoDB also uses B+Tree structures but differs fundamentally:

The data file itself is the primary index (clustered index). Leaf nodes store complete rows, and the primary key defines the order.

Secondary indexes store the primary key value in their data field, not the row address.

Because InnoDB clusters data with the primary key, using a long or non‑monotonic primary key (e.g., UUID) can cause frequent page splits and poor performance. An auto‑increment integer is usually the best primary key choice.

How to Build Effective Indexes

Index creation principle

The most important rule is the left‑most prefix principle. For a composite index (A, B, C), queries can use the index only if they reference columns from the left: A, (A,B), or (A,B,C). Skipping a left column makes later columns unusable.

Range queries stop index usage after the first range condition.

Equality (=) and IN conditions can be reordered; the optimizer will rearrange them.

Choose columns with high cardinality (distinct/total) for indexes.

Avoid functions on indexed columns (e.g., FROM_UNIXTIME(col)) because they prevent index usage.

Prefer extending existing indexes over creating new ones when possible.

Common Index Optimization Tips

Follow the left‑most prefix rule; MySQL stops matching when it encounters a range condition.

Equality and IN conditions can be in any order; the optimizer will reorder them.

Select high‑cardinality columns; the selectivity formula is count(distinct col)/count(*).

Keep indexed columns “clean” (no calculations) to allow index usage.

When possible, extend an existing index instead of adding a new one.

MySQL Performance Tuning

Configuration Tuning

Key InnoDB settings: innodb_buffer_pool_size – size of the data and index cache; larger is better (e.g., 5‑6 GB on a 8 GB server). innodb_log_file_size – size of the redo log; larger values improve write performance. max_connections – increase if you see “Too many connections” errors, but beware of resource exhaustion. innodb_file_per_table – store each table in its own .ibd file (default ON in MySQL 5.6+). innodb_flush_log_at_trx_commit – 1 for full ACID safety, 2 for better performance with acceptable risk, 0 for maximum speed (use only on replicas). innodb_flush_method – O_DIRECT for hardware RAID with battery‑backed cache, otherwise fdatasync. innodb_log_buffer_size – increase if large BLOB/TEXT transactions cause log buffer waits. query_cache_size – generally set to 0; the query cache is a known bottleneck. log_bin – enable binary logging for replication or point‑in‑time recovery.

SQL Tuning

Use the slow‑query log to capture long‑running statements, then analyze them with EXPLAIN to see whether indexes are used.

Typical EXPLAIN steps

Run the query with SQL_NO_CACHE to ensure results aren’t cached.

Identify the table with the smallest filtered row count.

Check the execution plan; ensure the expected index is chosen.

For ORDER BY … LIMIT queries, let the optimizer use the index to avoid sorting.

Understand the business use case to guide index design.

Apply index‑building principles from earlier sections.

Iterate: if the plan is unsatisfactory, re‑analyze from step 1.

EXPLAIN output fields:

id – execution order.

select_type – type of SELECT.

table – table examined.

type – access method (ALL, index, range, ref, eq_ref, const, system).

possible_keys – indexes that could be used.

key – index actually used.

key_len – length of the index used.

ref – columns or constants used with the index.

rows – estimated rows examined.

Extra – additional information.

Practical Example and Analysis

A table circlemessage_idx_0 has a primary key (msg_id, to_id) and a secondary index (from_id, circle_id). Two EXPLAIN statements show that when the condition msg_id >= … is present, MySQL uses only the primary key and scans ~350 k rows, because the leading column of the primary key (msg_id) forces the optimizer to ignore the more selective to_id column.

Since from_id != … cannot use an index, the from_id index is unnecessary for this query.

Optimization suggestion:

Create a composite index starting with the most selective columns, e.g., (to_id, circle_id, msg_id), so the query can filter by to_id first, then circle_id, and finally the range on msg_id.

Avoid over‑indexing; each additional index adds write overhead and storage cost.

Summary

Indexes are not “the more the better”; they must be carefully designed.

Distinguish between primary (clustered) keys and secondary indexes.

Understand B+Tree structure and left‑most prefix matching.

Apply the principles when creating indexes to match real query patterns.

References

http://blog.codinglabs.org/articles/theory-of-mysql-index.html https://tech.meituan.com/2014/06/30/mysql-index.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

InnoDB MySQL database optimization index MyISAM B+Tree

Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.