Overview of MySQL and InnoDB Storage Engine Architecture
This article provides a comprehensive overview of MySQL, detailing its configuration file search order, component architecture, various storage engines such as MyISAM, NDB, Memory, and an in‑depth examination of InnoDB’s internal structures, memory management, background threads, LRU handling, redo log buffering, and checkpoint mechanisms.
MySQL Overview
When a MySQL instance starts, it reads configuration files in a defined order; if none are found, default compile‑time parameters are used. The command mysql --help | grep my.cnf shows the locations searched.
Typical configuration file paths examined in order are /etc/my.cnf , /etc/mysql/my.cnf , /usr/local/mysql/etc/my.cnf , and ~/my.cnf . If the same option appears in multiple files, the last one read takes precedence.
MySQL consists of several components: connection‑pool, management services and tools, SQL interface, query parser, optimizer, cache, pluggable storage engines, and physical files.
The most distinctive feature of MySQL is its pluggable storage‑engine architecture, where the engine is associated with a table rather than the whole database.
Pluggable Engines
MyISAM
MyISAM does not support transactions, uses table‑level locking, and supports full‑text indexes. Prior to MySQL 5.5.8 it was the default engine (except on Windows). Its cache stores only index files.
MyISAM tables consist of .MYD (data) and .MYI (index) files. The myisampack utility can compress data files using static Huffman coding, producing read‑only tables.
Before MySQL 5.0 the default maximum table size was 4 GB; from 5.0 onward the default limit is 256 TB.
NDB Cluster Engine
NDB implements a “share‑nothing” cluster architecture, providing high availability. Data resides primarily in memory (non‑index data can be stored on disk from MySQL 5.1), giving extremely fast primary‑key lookups. Adding NDB data nodes scales performance linearly.
Joins are performed in the MySQL server layer rather than in the engine, which can cause high network overhead and slower query performance for complex joins.
Memory Engine
Formerly called HEAP, the Memory engine stores table data in RAM; data is lost on server restart. It is suitable for temporary tables and dimension tables in data warehouses. It uses hash indexes by default.
Limitations include table‑level locking, poor concurrency, and lack of support for TEXT and BLOB columns. Variable‑length columns are stored as fixed‑length, wasting memory.
MySQL uses the Memory engine for internal temporary tables; if a temporary table exceeds the Memory engine’s capacity or contains TEXT/BLOB columns, it is automatically converted to a MyISAM table on disk.
Comparing Storage Engines
The SHOW ENGINES statement or the information_schema.ENGINES table can be used to list supported engines.
InnoDB Storage Engine
InnoDB Architecture
InnoDB maintains several memory pools that together form a large buffer pool, responsible for internal data structures, disk‑page caching, redo‑log buffering, and other tasks.
Background Threads
Master Thread
The Master Thread asynchronously flushes dirty pages, merges the insert buffer, recycles UNDO pages, and performs other housekeeping.
IO Thread
IO Threads handle asynchronous I/O callbacks. MySQL 5.0 and later provide configurable numbers of read and write IO threads via innodb_read_io_threads and innodb_write_io_threads .
Purge Thread
After a transaction commits, its UNDO log may be reclaimed by a Purge Thread. Since MySQL 1.1, purge work can run in dedicated threads, improving CPU utilization.
Enabling a dedicated purge thread can be done with:
[mysqld]
innodb_purge_threads=1Page Cleaner Thread
Introduced in InnoDB 1.2.x, this thread isolates dirty‑page flushing from the Master Thread, reducing query‑thread blocking.
Memory Management
Buffer Pool
The buffer pool caches disk pages in RAM to bridge the CPU‑disk speed gap. Pages are “fixed” in the pool when read, and subsequent reads hit the pool if the page is present.
Modifications are made to pages in the buffer pool and later flushed to disk via a checkpoint mechanism. The pool size is set with innodb_buffer_pool_size ; an example shows a 15 GB pool.
Pages cached include index, data, UNDO, insert buffer, adaptive hash index, lock information, and dictionary data.
Multiple buffer‑pool instances can be created (default 1) and are configured with innodb_buffer_pool_instances .
LRU, Free, and Flush Lists
InnoDB uses an LRU algorithm with a midpoint insertion strategy to manage page recency. The innodb_old_blocks_pct variable controls the midpoint position (default 37%).
The innodb_old_blocks_time variable defines how long a page must stay after the midpoint before becoming “young”. Adjusting these values can reduce hot‑page eviction during large scans.
When the buffer pool is empty, pages are allocated from the Free list; otherwise, the LRU tail is evicted. Dirty pages reside in both the LRU and Flush lists; the Flush list tracks pages that must be written back.
Various SHOW ENGINE INNODB STATUS and information_schema tables (e.g., INNODB_BUFFER_POOL_STATS , INNODB_BUFFER_PAGE_LRU ) expose buffer‑pool statistics.
Compressed Pages and Unzip LRU
Since InnoDB 1.0, pages can be compressed to 1 KB, 2 KB, 4 KB, or 8 KB. Compressed pages are managed in an unzip_LRU list, with allocation performed via a buddy‑system algorithm.
Redo Log Buffer
The redo‑log buffer temporarily holds log records before they are flushed to the redo‑log files. Its size is controlled by innodb_log_buffer_size (default 8 MB) and is typically flushed every second, on transaction commit, or when the buffer is half full.
Additional Memory Pool
An extra memory pool supplies memory for internal control structures (e.g., buffer‑control blocks). Large buffer‑pool configurations may require increasing this pool.
Checkpoint Technology
Checkpoints periodically flush dirty pages from the buffer pool to disk, reducing recovery time after a crash. Two checkpoint types exist: Sharp (full flush on shutdown) and Fuzzy (partial flush during normal operation).
Fuzzy checkpoints are triggered by the Master Thread, the FLUSH_LRU_List, async/sync flush conditions, or when the dirty‑page percentage exceeds innodb_max_dirty_pages_pct (default 75%).
Parameters such as innodb_fast_shutdown , innodb_lru_scan_depth , and innodb_max_dirty_pages_pct control checkpoint behavior.
References
MySQL Technical Insights – InnoDB Storage Engine, 2nd Edition
政采云技术
ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.