InnoDB IO Subsystem and Buffer Pool Memory Management Overview
This article explains InnoDB's file I/O interfaces, asynchronous and synchronous read/write mechanisms, background IO threads, AIO request handling, concurrency controls, prefetch strategies, log write padding, and the evolution of buffer pool initialization, chain management, and page eviction in MySQL 5.7.
Overview
The previous article introduced InnoDB's physical file structure; this one continues with the IO interface and memory management of the InnoDB file system.
IO Subsystem
InnoDB provides both synchronous and asynchronous file operations. Asynchronous IO can use Native AIO (requiring the libaio development package) or the older simulated AIO, which is now discouraged in production.
Read operations are usually synchronous, but prefetch reads are asynchronous and handled by background IO threads. Various background threads (Purge, Master, etc.) also trigger reads, and crash recovery may issue asynchronous reads to speed up recovery.
Write operations follow a WAL model: transaction logs are written at commit, a master thread performs periodic redo fsync, and dirty pages are flushed by the Page Cleaner thread or, when the buffer pool is low, by user threads. Percona Server and MySQL 5.7 include optimizations to reduce user‑thread involvement.
DDL actions such as TRUNCATE , DROP TABLE , and RENAME TABLE require coordination via special flags and counters.
IO Backend Threads
IO READ threads: configured by innodb_read_io_threads , process asynchronous read requests from the queue AIO::s_reads .
IO WRITE threads: configured by innodb_write_io_threads , handle asynchronous write requests from AIO::s_writes .
LOG thread: handles asynchronous writes for checkpoint information via AIO::s_log .
IBUF thread: reads change‑buffer pages from AIO::s_ibuf .
All synchronous writes are performed by user threads or other background threads; the above threads only handle asynchronous work.
Issuing IO Requests
Entry function: os_aio_func . For synchronous requests ( OS_AIO_SYNC ) the calling thread directly invokes os_file_read_func or os_file_write_func .
For asynchronous requests, the user thread selects a slot from the appropriate queue ( AIO::select_slot_array ), reserves it ( AIO::reserve_slot ), fills it with file, offset, and data information, and then dispatches the request.
When using Native AIO with transparent compression or tablespace encryption, the data page is compressed or encrypted before submission.
Native AIO uses AIO::linux_dispatch to hand the request to the kernel; if Native AIO is disabled, the simulated handler thread is awakened via AIO::wake_simulated_handler_thread .
Compiling Native AIO requires the libaio-dev package and the srv_use_native_aio option.
Processing Asynchronous AIO Requests
IO thread entry: io_handler_thread --> fil_aio_wait , which calls os_aio_handler to fetch pending requests.
For Native AIO, os_aio_linux_handle retrieves completed events using io_getevents with a 500 ms timeout.
For simulated AIO, os_aio_simulated_handler processes the queue, merging adjacent requests when possible (disabled in MySQL 5.7).
After a slot completes, fil_node_complete_io decrements node->n_pending . File‑write operations are added to fil_system->unflushed_spaces unless O_DIRECT_NO_FSYNC is used.
Finally, buf_page_io_complete performs page‑corruption checks, change‑buffer merges, and updates the double‑write buffer; log writes invoke log_io_complete and update checkpoint information via log_complete_checkpoint .
IO Concurrency Control
On Linux, pwrite/pread allow concurrent file IO without locks; on Windows, file locks are required. Counters and flags guard concurrent operations such as file extension, table drop, and rename.
File Prefetch
InnoDB supports random, linear, and logical prefetch. Random prefetch ( buf_read_ahead_random ) reads a 64‑page extent when recent accesses exceed BUF_READ_AHEAD_RANDOM_THRESHOLD . Linear prefetch ( buf_read_ahead_linear ) works similarly based on innodb_read_ahead_threshold . Logical prefetch ( row_read_ahead_logical ) scans the clustered index to issue asynchronous reads for leaf pages, useful for fragmented tables.
Log Write Padding
MySQL 5.7 introduces innodb_log_write_ahead_size to align redo‑log writes to the disk block size, padding the tail of the log file with zeros via log_write_up_to to avoid read‑modify‑write penalties.
Buffer Pool Memory Management
From MySQL 5.6 to 5.7 the buffer pool allocation changed: each instance can now contain multiple configurable chunks (default 127 MiB), enabling online resizing. The total pool size is rounded up to instances × chunk_size , so excessive instance counts can waste memory.
Each buffer‑pool instance maintains several LRU lists and hash tables to locate pages. Concurrency is controlled with read‑write locks, buf_fix_count , and io_fix flags.
When a page is needed, buf_page_init_for_read allocates a free block, acquires the hash X‑lock, checks for existing entries, inserts the block, sets BUF_IO_READ , releases the lock, and finally adds the block to the LRU.
If the free list is exhausted, page eviction occurs via buf_LRU_free_from_unzip_LRU_list or buf_flush_single_page_from_LRU . MySQL 5.7 adds multiple page‑cleaner threads that periodically flush dirty pages based on heuristics.
Source: MySQL monthly report (February 2016).
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.