Databases 13 min read

Why MySQL Uses B+ Trees for Indexes Instead of Skip Lists

This article explains the structures of B+ trees and skip lists, compares their read and write performance, and details why MySQL chooses B+ trees for indexing while Redis and RocksDB prefer skip lists, highlighting trade‑offs in disk I/O, fan‑out, and balancing overhead.

IT Services Circle
IT Services Circle
IT Services Circle
Why MySQL Uses B+ Trees for Indexes Instead of Skip Lists

B+ Tree Structure

A B+ tree consists of multiple pages, each typically 16KB in size. Leaf nodes store the actual row data, while internal (non‑leaf) nodes store index information (primary‑key ID and page number). Searches start at the root and use binary search within a page directory, reducing the time complexity from O(n) to O(log n) by trading space for time.

When searching for a row (e.g., ID = 5), the algorithm follows the records in the root page, selects the appropriate child page based on the key range, and continues down to the leaf page where the exact row is found.

Skip List Structure

A skip list builds on a simple singly linked list by adding multiple upper layers. Each upper layer contains a subset of the nodes from the layer below, chosen randomly. This creates a hierarchy where a search can quickly narrow the range by jumping between layers, achieving O(log n) search time.

For example, to find ID = 10, the top layer might show the range 1 → 6 → 12; the algorithm determines that 10 lies between 6 and 12, then descends to the next layer, halving the search space each time.

Differences Between B+ Tree and Skip List

Both structures store all data in the lowest layer and are suitable for range queries. However, they differ in how they handle insertions and deletions. B+ trees maintain balance through page splits and merges, while skip lists rely on random promotion without rebalancing.

B+ Tree Insertion

A B+ tree is a multi‑way balanced tree. When a new row is inserted, the leaf node and its parent index node (both 16KB ) may become full, triggering one of three cases:

Both leaf and index nodes have space: Insert directly into the leaf.

Leaf node full, index node not: Split the leaf node and add a new entry to the index node.

Both leaf and index nodes full: Split both and possibly create a new upper level.

Skip List Insertion

Insertion starts at the bottom layer. Whether the new element is promoted to higher layers is decided by a random function. Typically, there is a 50% chance to be added to level 2, a 25% chance for level 3, and so on, halving the probability at each higher level.

For example, inserting ID = 6 may be promoted to level 3 (25% chance), requiring the element to appear in the bottom, second, and third layers.

Why MySQL Indexes Use B+ Trees Instead of Skip Lists

B+ trees have a high fan‑out because each node stores many keys (a 16KB page can hold thousands of entries). Typically only three levels are needed to index around 20k rows, resulting in at most three disk I/O operations per lookup.

In contrast, a skip list storing the same amount of data would need roughly 24 levels. In the worst case those levels could reside on 24 different pages, causing 24 disk I/O operations, making reads significantly slower.

While skip lists have faster writes—no page splits or rebalancing—the read performance penalty leads MySQL to prefer B+ trees for its primary storage engines (InnoDB, MyISAM).

Why Redis Uses Skip Lists for ZSET

Redis is an in‑memory database, so disk I/O is irrelevant. Skip lists are simple to implement, require no balancing rotations, and provide O(log n) search with minimal overhead, making them ideal for Redis’s ordered set (ZSET) implementation.

Summary

B+ trees are multi‑way balanced search trees with high fan‑out; only ~3 levels are needed for ~20k rows, giving fewer disk I/O operations than a skip list, so MySQL chooses B+ trees for indexes.

Redis operates entirely in memory; skip lists are easier to maintain and avoid tree‑balancing costs, so Redis uses them for ZSET.

RocksDB also uses skip lists; its write performance is better than InnoDB’s B+‑tree based engine, but read performance is poorer, confirming that B+ trees remain superior for read‑heavy workloads.

References

《MYSQL内核:INNODB存储引擎 卷1》

《RocksDB和Innodb引擎性能PK胜负难料?》

https://cloud.tencent.com/developer/article/1813695

IndexingDatabaseRedisMySQLSkipListB+Tree
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.