Databases 25 min read

In-depth Overview of MySQL InnoDB Page Types, Compression, IO Subsystem, and Buffer Pool Management

This article provides a comprehensive technical overview of MySQL InnoDB's internal page structures—including compressed, system, external, encrypted, and R‑TREE pages—along with detailed explanations of the IO subsystem, double‑write buffer, temporary tablespaces, and buffer pool memory management mechanisms.

Architect
Architect
Architect
In-depth Overview of MySQL InnoDB Page Types, Compression, IO Subsystem, and Buffer Pool Management

Traditional Compressed Page Format

When a table is created or altered with row_format=compressed and a key_block_size of 1, 2, 4, or 8, the resulting .ibd file is divided into blocks of the corresponding size (e.g., 4 KB for a block size of 4). Compressed pages consist of a compressed representation stored on disk and an uncompressed copy kept in memory. Modifications are first applied to the uncompressed page, then logged to the compressed page’s mlog using operations such as Insert (full record), Delete‑Insert Update (mark old dense slot as deleted and write a new record), In‑place Update (write new record directly), and Delete (mark dense slot as deleted). Compression and decompression are performed by the functions page_zip_compress and page_zip_decompress .

System Data Pages

All non‑independent pages stored in ibdata are called system data pages. Important system pages include:

FSP_IBUF_HEADER_PAGE_NO : Header page for the change buffer (type FIL_PAGE_TYPE_SYS ).

FSP_IBUF_TREE_ROOT_PAGE_NO : Root page of the change buffer B‑tree (fixed page number 5).

FSP_TRX_SYS_PAGE_NO / FSP_FIRST_RSEG_PAGE_NO : Transaction system page (page 6) containing fields such as TRX_SYS_TRX_ID_STORE , TRX_SYS_FSEG_HEADER , and double‑write buffer metadata.

FSP_DICT_HDR_PAGE_NO : Dictionary header page (page 8) storing metadata for system tables (e.g., DICT_HDR_TABLE_ID , DICT_HDR_INDEX_ID , etc.).

Creation of these pages is handled by functions like btr_create , trx_sysf_create , and dict_hdr_create .

Double‑Write Buffer

InnoDB writes pages first to a double‑write buffer to protect against partial‑page writes. The buffer resides in ibdata (pages 64‑191) and consists of two blocks of 128 pages each. During a crash, corrupted pages can be recovered from this buffer.

External Storage Pages

Large column values may be stored in external pages. Types include FIL_PAGE_TYPE_BLOB (uncompressed), FIL_PAGE_TYPE_ZBLOB (compressed first blob page), and FIL_PAGE_TYPE_ZBLOB2 (subsequent compressed blob pages). Records contain a 20‑byte pointer that references these external pages.

Encrypted and R‑TREE Pages (MySQL 5.7)

MySQL 5.7 adds three new page types for encryption and spatial indexes:

FIL_PAGE_ENCRYPTED : Encrypted ordinary data page.

FIL_PAGE_COMPRESSED_AND_ENCRYPTED : Compressed then encrypted page.

FIL_PAGE_ENCRYPTED_RTREE : Encrypted R‑TREE index page.

Encryption is performed by os_file_encrypt_page → Encryption::encrypt before writing and decrypted by os_file_io_complete → Encryption::decrypt when reading. Key information is stored in the first page of the .ibd file via fsp_header_init → fsp_header_fill_encryption_info . The master key can be rotated with ALTER INSTANCE ROTATE INNODB MASTER KEY .

Temporary Tablespace (ibtmp)

MySQL 5.7 introduces a dedicated temporary tablespace named ibtmp1 (default 12 MB, configurable via innodb_temp_data_file_path ). Non‑compressed temporary tables and the first 32 temporary rollback segments reside here, created by trx_sys_create_noredo_rsegs .

Redo Log Files (ib_logfile)

Redo logs now support CRC32 checksums (controlled by innodb_log_checksums ) and include a version header (see WL#8845). The header is updated by log_group_file_header_flush . Compatibility with older versions is provided by recv_find_max_checkpoint_0 , but downgrading after an upgrade requires a clean shutdown.

IO Subsystem

The IO subsystem handles all disk‑to‑memory operations. Synchronous reads/writes are performed directly by the calling thread, while asynchronous operations are queued in AIO structures ( AIO::s_reads , AIO::s_writes , AIO::s_log , AIO::s_ibuf ) and processed by dedicated background threads.

IO request entry point is os_aio_func . For asynchronous requests, a slot is allocated in the appropriate segment, the request details are stored, and the request is dispatched either via native AIO ( AIO::linux_dispatch ) or simulated AIO ( os_aio_simulated_handler ).

IO completion is handled by io_handler_thread → fil_aio_wait , which calls os_aio_handler . Native AIO uses os_aio_linux_handle with a 500 ms reap timeout, while simulated AIO merges adjacent requests (a feature disabled in 5.7) and processes slots based on age.

Read‑Ahead Strategies

InnoDB implements three read‑ahead methods:

Random read‑ahead ( buf_read_ahead_random ) reads 64‑page extents when recent page accesses exceed a threshold.

Linear read‑ahead ( buf_read_ahead_linear ) reads sequential extents after a series of consecutive page accesses.

Logical read‑ahead (used by Facebook) scans leaf nodes of a clustered index and asynchronously reads pages in logical order via row_search_for_mysql → row_read_ahead_logical .

Buffer Pool Memory Management

From MySQL 5.6 to 5.7 the buffer pool architecture changed: each instance can now consist of multiple chunks (default 127 MB), enabling online resizing. The total pool size is aligned to instances × chunk_size , which can cause overallocation if many instances are configured.

Each buffer‑pool instance maintains several linked lists (LRU, free list, unzip LRU, etc.) to locate pages by space ID and page number. Key structures are illustrated in the original diagrams.

Concurrency control uses read/write locks, buf_fix_count , and IO‑fix flags. When a page is read, a shared lock and buf_fix_count are incremented; modifications acquire an exclusive lock. Flush operations skip pages with non‑zero buf_fix_count to avoid costly lock acquisition.

Page eviction occurs when the free list is exhausted. InnoDB first tries to evict from the unzip LRU if the workload is I/O‑bound; otherwise it evicts from the main LRU. If no replaceable page is found, a single‑page flush ( buf_flush_single_page_from_LRU ) is triggered, which can become a performance bottleneck under high concurrency. Multiple page‑cleaner threads introduced in 5.7 mitigate this issue.

InnoDBMySQLDatabase InternalsBuffer PoolPage StructuresIO Subsystem
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.