Databases 9 min read

Mastering Database Indexes: From Binary Trees to B+Trees and Beyond

This article explains the fundamentals and structures of database indexes—including binary trees, red‑black trees, B‑Tree, B+Tree, hash indexes—and details how MySQL’s InnoDB and MyISAM engines implement clustered and non‑clustered indexes, covering their characteristics, storage files, and query behavior.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Mastering Database Indexes: From Binary Trees to B+Trees and Beyond

Index is a structure that sorts values of one or more columns in a database table, allowing fast access to specific information.

Index Data Structures

Binary Tree

A binary tree is an ordered tree where each node has at most two children. It is defined recursively: a binary tree is either empty or consists of a root node with a left and right subtree, each of which is also a binary tree.

For the array {1,2,3,4,5} the structure becomes a linked list.

Characteristics:

Each parent node has two child nodes.

The right node’s value is greater than the left node’s value.

Red‑Black Tree

A red‑black tree is a specific type of binary tree used to organize data such as numeric keys. If a binary search tree is a red‑black tree, every subtree is also a red‑black tree.

It is a balanced binary search tree variant; its left and right sub‑tree heights may differ by more than one, but rebalancing costs are lower than AVL trees, giving better average performance.

Search in a red‑black tree uses the same algorithm as a regular binary search tree, without needing color information.

Key properties:

Each node is colored either red or black.

The root node is black.

All leaves (NIL nodes) are black.

Red nodes cannot have red children.

Every path from a node to its descendant leaves contains the same number of black nodes.

These constraints guarantee that the longest possible path is at most twice the length of the shortest, keeping the tree roughly balanced.

Read‑only operations are identical to those on a regular binary search tree.

B‑Tree

Leaf nodes have the same depth; leaf pointers are null.

All elements are unique.

Data within a node is sorted in ascending order from left to right.

B+Tree

Non‑leaf nodes store only index values (redundant), allowing more indexes.

Leaf nodes contain all index fields.

Leaf nodes are linked by pointers, improving range query performance.

Key characteristics: ordered nodes, leaf pointers, non‑leaf nodes store redundant indexes.

Hash Index

Hashing the key directly locates the storage position.

Often more efficient than B+Tree for equality searches.

Supports only “=“ and “IN”; does not support range queries.

Subject to hash collisions.

Index Implementations

InnoDB Index (Clustered)

The table data file itself is organized as a B+Tree index structure.

Clustered index leaf nodes contain the full data rows.

If no primary key is defined, MySQL creates a hidden row‑id column.

Using an integer auto‑increment primary key speeds B+Tree ordering and storage because data is inserted sequentially, reducing rebalancing.

Non‑primary indexes store the primary key value in their leaf nodes for consistency and space efficiency.

Non‑Clustered Index Example

Querying with

name = 'Alice'

uses the non‑clustered index to locate the primary key, then performs a “row lookup” (back‑table) using the primary key.

MySQL Storage Files

.frm stores table schema.

.ibd stores indexes and data for InnoDB.

MyISAM Index (Non‑Clustered)

Index files and data files are separate.

MyISAM uses three files:

.frm – table structure.

.myd – stores row data.

.myi – stores index information.

Clustered vs Non‑Clustered Indexes

Clustered indexes store data together with the index, providing faster queries because no additional file lookup is needed. Non‑clustered indexes keep pointers to the data.

Composite (Multi‑Column) Index

Multiple columns are combined into a single index. The “left‑most prefix” rule applies: the index can be used only if the query filters on the leftmost columns of the index.

Example:

<code>where name = 'Jeff' and age = 22  -- uses index</code>

Queries that skip the leftmost column, such as

where age = 30 and position='manager'

or

where position='dev'

, cannot use the composite index.

Reference

Baidu Baike

MySQL command to view InnoDB page size:

<code>mysql> show global status like 'Innodb_page_size';</code>
SQLInnoDBMyISAMB-TreeRed-Black TreeDatabase IndexesHash Index
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.