Inside MySQL InnoDB: Unveiling Architecture, Memory Structures, and Transaction Mechanics
This article provides a comprehensive overview of MySQL InnoDB's internal architecture, covering its memory and disk structures, buffer pool, change buffer, adaptive hash index, log buffer, various tablespace types, page layout, undo and redo logs, double‑write buffer, transaction isolation levels, and how the engine satisfies ACID properties.
1 InnoDB Architecture
InnoDB consists of two major parts: in‑memory structures and on‑disk structures. It follows a log‑first strategy, recording changes in a redo log before applying them to data pages, which allows fast sequential I/O but requires careful checkpoint handling to avoid data loss on crashes.
2 InnoDB In‑Memory Structures
The primary in‑memory components are the Buffer Pool, Change Buffer, Adaptive Hash Index, and Log Buffer.
Buffer Pool
The Buffer Pool holds the page cache, change buffer, and data dictionary cache, typically consuming about 80% of the MySQL server's physical memory. Pages are managed with an LRU algorithm and split into a New sublist (default 5/8) and an Old sublist (default 3/8, configurable via
innodb_old_blocks_pct).
Buffer pool usage can be inspected with
show engine innodb status:
<code>----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137428992
Dictionary memory allocated 102398
Buffer pool size 8191
Free buffers 7893
Database pages 298
Modified db pages 0
......</code>Change Buffer
The Change Buffer (formerly Insert Buffer) caches modifications to non‑unique secondary index pages, reducing random I/O. It can be enabled or disabled with the
innodb_change_bufferingvariable (default
all).
<code>-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
insert 0, delete mark 0, delete 0
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 34673, node heap has 0 buffer(s)</code>Adaptive Hash Index
Adaptive Hash Index (AHI) provides O(1) lookups for frequently accessed pages. It is enabled by default in MySQL 5.7 and can be tuned with
innodb_adaptive_hash_indexand
innodb_adaptive_hash_index_parts.
Log Buffer
The Log Buffer stores redo log records in memory. Its size is controlled by
innodb_log_buffer_size(default 16M). Flushing behavior is governed by
innodb_flush_log_at_trx_commitand
innodb_flush_log_at_timeout.
Value 1: write log buffer to OS cache and flush to disk on every commit (high safety).
Value 0: write to OS cache once per second, then flush (possible >1 s data loss).
Value 2: write to OS cache on every commit, flush once per second (possible >1 s data loss).
3 InnoDB On‑Disk Structures
Tablespaces: system tablespace (ibdata1), temporary tablespace, regular tablespace, undo tablespace, and file‑per‑table tablespace (default in MySQL 5.7).
Redo log: stores flushed Log Buffer data.
Example of creating a test database and table:
<code>mysql> create database test;
mysql> use test;
mysql> create table t (id int auto_increment primary key, ch varchar(5000));
mysql> insert into t (ch) values('abc');
mysql> insert into t (ch) values('defgh');</code>3.1 InnoDB Table Structure
InnoDB uses the Barracuda file format with COMPACT and DYNAMIC row formats. Row format can be set with
ROW_FORMAT. Table status can be inspected with
SHOW TABLE STATUSor querying
INFORMATION_SCHEMA.INNODB_SYS_TABLES.
<code>*************************** 1. row ***************************
Name: t
Engine: InnoDB
Version: 10
Row_format: Dynamic
Rows: 2
Avg_row_length: 8192
Data_length: 16384
Index_length: 0
Auto_increment: 3
Create_time: 2019-01-13 02:24:52
Collation: utf8mb4_general_ci
</code>3.2 Tablespace Overview
File‑per‑table tablespaces store each table's data and indexes in a separate
.ibdfile, simplifying space reclamation and transportable tablespaces.
<code>CREATE TABLESPACE `ts1` ADD DATAFILE 'ts1.ibd' Engine=InnoDB;
CREATE TABLE t1 (c1 INT PRIMARY KEY) TABLESPACE ts1;
CREATE TABLE t2 (c2 INT PRIMARY KEY) TABLESPACE ts1;
ALTER TABLE t2 TABLESPACE=innodb_file_per_table;
DROP TABLE t1;
DROP TABLESPACE ts1;</code>3.3 File‑Per‑Table Page Layout
An
.ibdfile starts with a 96 KB header (FSP_HDR) followed by pages of 16 KB (default). Pages include FSP_HDR, IBUF_BITMAP, INODE, and INDEX pages.
Index Page Analysis
Index pages contain a FIL Header, INDEX Header, FSEG Header, system records (infimum and supremum), and user records. Example hexdump of
t.ibdshows checksum, LSN, page type, and space ID.
<code># hexdump -C t.ibd
0000c000 95 45 82 8a 00 00 00 03 ff ff ff ff ff ff ff ff |.E..............|
0000c010 00 00 00 00 00 28 85 7c 45 bf 00 00 00 00 00 00 |.....(.|E.......|
...</code>Page Directory
The page directory stores slot offsets for fast binary search within a page. Example slot table for table
t2:
<code>slot offset type owned key
0 99 infimum 1
1 112 supremum 3</code>4 InnoDB Transaction Isolation Levels
Read Uncommitted – dirty reads.
Read Committed – sees only committed data; may suffer non‑repeatable reads.
Repeatable Read – uses a consistent read view for the whole transaction; protects against non‑repeatable reads, uses next‑key locks to avoid phantom reads.
Serializable – lock‑based full serialization.
5 InnoDB and the ACID Model
Atomicity
Implemented via transaction support,
COMMIT/
ROLLBACK, and the undo log.
Consistency
Ensured by double‑write buffer, crash recovery, and isolation mechanisms.
Isolation
Provided by the various isolation levels and InnoDB's locking protocol.
Durability
Redo log.
Double‑write buffer (configurable via
innodb_doublewrite).
innodb_flush_log_at_trx_commitand
innodb_flush_log_at_timeoutfor log flushing.
sync_binlogfor binary log durability.
OS‑level
fsynccalls.
References
https://blog.jcole.us/innodb/
https://dev.mysql.com/doc/refman/5.7/en/innodb-storage-engine.html
http://ourmysql.com/archives/1228
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.