Databases 14 min read

Why UUID Primary Keys Trigger InnoDB Page Splits and How to Prevent Them

This article explains the internal mechanics of MySQL 8.0 InnoDB pages, how random primary keys such as UUID cause costly page splits and merges, and provides Java‑level best practices and MySQL commands to optimize primary‑key design for better performance and space utilization.

Xuanwu Backend Tech Stack
Xuanwu Backend Tech Stack
Xuanwu Backend Tech Stack
Why UUID Primary Keys Trigger InnoDB Page Splits and How to Prevent Them

Preface

Java developers are often asked in interviews why InnoDB tables should have a primary key, preferably an auto‑increment integer or a Snowflake‑generated ID, rather than a UUID, and how UUIDs lead to page splits and merges at the B+‑tree level.

1. What Is an InnoDB Page?

The smallest unit managed by InnoDB is a page , typically 16 KB (larger than the OS 4 KB page to improve I/O). You can view the page size with:

SHOW VARIABLES LIKE 'innodb_page_size';

The internal layout of a standard INDEX page includes:

File Header (38 bytes) : page number, type, previous/next pointers, forming a doubly‑linked list.

Page Header (56 bytes) : status, free‑space address, etc.

Infimum + Supremum : virtual records that bound real records.

User Records : actual data stored as a singly‑linked list ordered by the primary key.

Free Space : unused area within the page.

Page Directory : sparse index for intra‑page binary search.

2. Page Split (Page Split)

A page split occurs when inserting a record into a full page, especially with random keys. Sequential auto‑increment keys insert at the rightmost leaf, avoiding splits.

2.1 Why Does It Happen?

Random primary keys (e.g., UUID) often need to be placed in the middle of a leaf page, forcing InnoDB to split the page to maintain B+‑tree order.

2.2 Detailed Process

Locate : B+‑tree algorithm finds the target page for the new key (e.g., 25).

Check : Insufficient free space in the page.

Allocate New Page : A fresh page (Page B) is allocated.

Data Migration (Core) :

Identify the split point (usually the middle).

Move a subset of records (e.g., 30, 40) from Page A to Page B.

Insert the new record (25) into the appropriate page.

Pointer Reconnection :

Update FIL_PAGE_NEXT of Page A to point to Page B.

Update FIL_PAGE_PREV of Page B to point back to Page A.

Adjust the successor page’s PREV pointer.

Parent Node Update : If the parent node becomes full, recursive splits may propagate upward, possibly increasing tree height.

2.3 Cost of a Split

I/O Overhead : Creating a new page, moving data, and updating pointers involve significant memory and disk I/O.

Space Fragmentation : Two half‑filled pages replace one full page, reducing space utilization.

Concurrency Impact : Locks are required to keep the split atomic, affecting performance under high concurrency.

3. Page Merge (Page Merge)

Page merge is the opposite operation, triggered after deletions or updates that shrink records, reclaiming space.

3.1 “Fake Deletion”

When a DELETE statement runs, InnoDB marks the record with delete_mask=1 but does not immediately remove it from disk; the space becomes reusable.

3.2 When Does Merge Occur?

InnoDB checks the MERGE_THRESHOLD (default 50%). If a page’s occupancy falls below this threshold, it becomes a candidate for merging.

3.3 Detailed Merge Process

Check : If Page B’s used space < MERGE_THRESHOLD (e.g., <8 KB), proceed.

Inspect Neighbour : Identify left or right neighbour (Page A).

Determine Feasibility : Can all records from Page B fit into Page A?

Execute Merge :

Copy remaining records from Page B to Page A.

Update Page A’s header.

Release Page : Page B becomes a free page, returned to the tablespace.

Update Parent : Remove the index entry pointing to Page B.

4. Java Practice and Optimization Strategies

4.1 Fatal Mistake: Using UUID as Primary Key

// Bad example: random ID causes frequent page splits
@Entity
public class User {
    @Id
    private String id = UUID.randomUUID().toString(); // nightmare for InnoDB
    // ...
}

Random UUIDs insert into the middle of the B+‑tree, causing many page splits and low cache hit rates.

4.2 Best Practice: Ordered IDs

// Good example: Snowflake or auto‑increment ID
public class IdGenerator {
    // Simulated Snowflake algorithm producing monotonically increasing Long IDs
    public synchronized long nextId() {
        // ... implementation of Snowflake ...
        return timestamp << 22 | datacenterId << 17 | workerId << 12 | sequence;
    }
}

Monotonically increasing keys ensure inserts always go to the rightmost leaf, enabling append‑only writes and near‑100 % page fill rate.

Avoid Splits : When the rightmost page fills, a new page is allocated without moving existing data (append‑only).

High Fill Rate : Pages stay almost fully occupied, maximizing space utilization.

4.3 MySQL 8 Optimizations

MySQL 8.0 runs a background purge thread to clean up after massive deletions. For heavily fragmented tables, you can force a rebuild: OPTIMIZE TABLE my_table_name; Warning: This rebuild locks the table; use with caution in production.

5. Interviewer Follow‑Up Questions

Q1: Does the B+‑tree height drop immediately after a page merge?

Not necessarily. Merges happen at leaf nodes; only when a merge empties a parent (or root) node does the tree height decrease, which may cascade upward.

Q2: Does InnoDB have mechanisms to mitigate split overhead?

Yes. The Insert Buffer (Change Buffer) records inserts to non‑unique secondary indexes when the target page is not in memory, deferring the actual split until the page is later accessed. This does not apply to primary (clustered) indexes.

Q3: Why does a table’s file size not shrink after many DELETEs?

DELETE only marks rows; the freed pages remain inside the tablespace as “free list” pages, so the OS‑visible file size stays the same unless you run TRUNCATE or OPTIMIZE TABLE to rebuild the table.

Conclusion

InnoDB’s page split and merge mechanisms balance B+‑tree order (query efficiency) against space utilization. Java developers influence these mechanisms through primary‑key strategy, batch inserts, and delete patterns. Prefer trend‑increasing keys (auto‑increment or Snowflake) and avoid random UUIDs to keep the underlying B+‑tree healthy.

PerformanceInnoDBMySQLUUIDprimary keyPage SplitPage Merge
Xuanwu Backend Tech Stack
Written by

Xuanwu Backend Tech Stack

Primarily covers fundamental Java concepts, mainstream frameworks, deep dives into underlying principles, and JVM internals.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.