Databases 13 min read

Why MySQL Chooses B+ Trees Over Skip Lists for Indexing

The article explains the structural differences between B+ trees and skip lists, compares their read‑write performance in MySQL and Redis, and shows why MySQL prefers B+ trees while Redis adopts skip lists for its in‑memory ZSET implementation.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Why MySQL Chooses B+ Trees Over Skip Lists for Indexing

Background

Scanning a table row‑by‑row gives linear time O(n). Indexes reduce the search complexity to logarithmic time O(log n). Two common on‑disk index structures that achieve O(log n) are B+ trees (used by MySQL) and skip lists (used by Redis and RocksDB). The following sections compare their internal layout, insertion algorithms, and I/O characteristics.

B+ Tree in MySQL

A B+ tree is a multi‑level structure built from fixed‑size pages, typically 16 KB each. Leaf pages store the actual row data; internal pages store only index entries (primary‑key value and child‑page number). Because each page can hold many entries, the fan‑out is high (hundreds of pointers per node), so the tree height is usually very small.

Search proceeds as follows:

Start at the root page.

Within the page, perform a binary search on the sorted directory to locate the child pointer that bounds the target key.

Follow the pointer to the next level and repeat until a leaf page is reached.

The binary search inside a page and the high fan‑out together keep the number of page reads to a few (often three) for millions of rows.

B+ tree query process
B+ tree query process

Skip List Structure

A skip list consists of several linked‑list layers stacked on top of a base singly‑linked list. Each element in the base list is promoted to the next higher level with a fixed probability p = 0.5. Consequently, level i contains roughly p^i of the elements of level i‑1 .

Search starts at the topmost level and moves forward until the next pointer would overshoot the target key, then drops one level and continues. Each level roughly halves the remaining search space, yielding logarithmic search time.

Two‑level skip list
Two‑level skip list
Three‑level skip list
Three‑level skip list

Insertion and Deletion Differences

B+ tree insertion must preserve balance and page capacity. Assuming a leaf page can hold three entries (for illustration), three cases arise:

Both leaf and parent internal page have free slots – insert the new row directly into the leaf.

Leaf page is full but parent internal page has space – split the leaf into two pages and insert a new index entry into the parent.

Both leaf and parent are full – split the leaf, split the parent, and create a new root level, increasing the tree height by one.

Skip list insertion always adds the element to the bottom list, then flips a coin repeatedly to decide whether to promote it to the next level (50 % chance for level 2, 25 % for level 3, etc.). No page splits or rebalancing are required; the probabilistic promotion maintains the expected logarithmic height.

Skip list insertion
Skip list insertion

Disk I/O Implications

Because each B+ tree node is a 16 KB page, a typical fan‑out of 100–200 entries means that a tree indexing roughly 2 KB of rows can be traversed in at most three page reads. In contrast, a skip list storing the same amount of data would need about 2^24 elements to reach a comparable height of ~24 levels. In the worst case each level could reside on a different disk page, leading to up to 24 I/O operations for a single lookup.

Fewer I/O operations translate directly into faster read latency, which is why MySQL’s InnoDB engine adopts B+ trees for both primary‑key and secondary indexes.

Write‑Heavy Workloads

Skip lists excel in write‑intensive scenarios because they avoid costly page splits and rebalancing. Insertions are cheap: after placing the record in the base list, a few random coin flips decide the promotion depth. B+ trees, by contrast, may need to split leaf pages, propagate splits upward, and occasionally increase tree height, incurring additional disk writes.

Real‑World Engine Choices

MySQL (InnoDB) – uses B+ trees for all on‑disk indexes. The design prioritises read performance and leverages mature page‑management code.

Redis (ZSET) – an in‑memory database where disk I/O does not exist. The simplicity of skip lists (no rotations, no page splits) and their O(log n) operations make them ideal for ordered sets stored entirely in RAM.

RocksDB – a key‑value store built on top of LevelDB that internally employs skip lists. This yields higher write throughput than InnoDB’s B+ trees, at the cost of slower read latency.

Conclusion

B+ trees provide a high‑fan‑out, balanced structure that typically requires only three page reads to locate a record, making them optimal for read‑heavy, disk‑based databases such as MySQL.

Skip lists achieve comparable logarithmic complexity with a much simpler, probabilistic design. When data resides in memory (Redis) or when write throughput is paramount (RocksDB), skip lists are preferable.

The choice between the two structures hinges on the trade‑off between read‑latency (favoring B+ trees) and write‑efficiency/simplicity (favoring skip lists).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

indexingdatabaseredismysqlskiplistB+Tree
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.