Databases 32 min read

Inside MySQL InnoDB: Unveiling Architecture, Memory Structures, and Transaction Mechanics

This article provides a comprehensive overview of MySQL InnoDB's internal architecture, covering its memory and disk structures, buffer pool, change buffer, adaptive hash index, log buffer, various tablespace types, page layout, undo and redo logs, double‑write buffer, transaction isolation levels, and how the engine satisfies ACID properties.

Efficient Ops
Efficient Ops
Efficient Ops
Inside MySQL InnoDB: Unveiling Architecture, Memory Structures, and Transaction Mechanics

1 InnoDB Architecture

InnoDB consists of two major parts: in‑memory structures and on‑disk structures. It follows a log‑first strategy, recording changes in a redo log before applying them to data pages, which allows fast sequential I/O but requires careful checkpoint handling to avoid data loss on crashes.

InnoDB architecture diagram
InnoDB architecture diagram

2 InnoDB In‑Memory Structures

The primary in‑memory components are the Buffer Pool, Change Buffer, Adaptive Hash Index, and Log Buffer.

Buffer Pool

The Buffer Pool holds the page cache, change buffer, and data dictionary cache, typically consuming about 80% of the MySQL server's physical memory. Pages are managed with an LRU algorithm and split into a New sublist (default 5/8) and an Old sublist (default 3/8, configurable via

innodb_old_blocks_pct

).

Buffer pool usage can be inspected with

show engine innodb status

:

<code>----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137428992
Dictionary memory allocated 102398
Buffer pool size   8191
Free buffers       7893
Database pages     298
Modified db pages  0
......</code>

Change Buffer

The Change Buffer (formerly Insert Buffer) caches modifications to non‑unique secondary index pages, reducing random I/O. It can be enabled or disabled with the

innodb_change_buffering

variable (default

all

).

<code>-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34673, node heap has 0 buffer(s)</code>

Adaptive Hash Index

Adaptive Hash Index (AHI) provides O(1) lookups for frequently accessed pages. It is enabled by default in MySQL 5.7 and can be tuned with

innodb_adaptive_hash_index

and

innodb_adaptive_hash_index_parts

.

Log Buffer

The Log Buffer stores redo log records in memory. Its size is controlled by

innodb_log_buffer_size

(default 16M). Flushing behavior is governed by

innodb_flush_log_at_trx_commit

and

innodb_flush_log_at_timeout

.

Value 1: write log buffer to OS cache and flush to disk on every commit (high safety).

Value 0: write to OS cache once per second, then flush (possible >1 s data loss).

Value 2: write to OS cache on every commit, flush once per second (possible >1 s data loss).

Flush log at trx commit options
Flush log at trx commit options

3 InnoDB On‑Disk Structures

Tablespaces: system tablespace (ibdata1), temporary tablespace, regular tablespace, undo tablespace, and file‑per‑table tablespace (default in MySQL 5.7).

Redo log: stores flushed Log Buffer data.

Example of creating a test database and table:

<code>mysql> create database test;
mysql> use test;
mysql> create table t (id int auto_increment primary key, ch varchar(5000));
mysql> insert into t (ch) values('abc');
mysql> insert into t (ch) values('defgh');</code>

3.1 InnoDB Table Structure

InnoDB uses the Barracuda file format with COMPACT and DYNAMIC row formats. Row format can be set with

ROW_FORMAT

. Table status can be inspected with

SHOW TABLE STATUS

or querying

INFORMATION_SCHEMA.INNODB_SYS_TABLES

.

<code>*************************** 1. row ***************************
Name: t
Engine: InnoDB
Version: 10
Row_format: Dynamic
Rows: 2
Avg_row_length: 8192
Data_length: 16384
Index_length: 0
Auto_increment: 3
Create_time: 2019-01-13 02:24:52
Collation: utf8mb4_general_ci
</code>

3.2 Tablespace Overview

File‑per‑table tablespaces store each table's data and indexes in a separate

.ibd

file, simplifying space reclamation and transportable tablespaces.

<code>CREATE TABLESPACE `ts1` ADD DATAFILE 'ts1.ibd' Engine=InnoDB;
CREATE TABLE t1 (c1 INT PRIMARY KEY) TABLESPACE ts1;
CREATE TABLE t2 (c2 INT PRIMARY KEY) TABLESPACE ts1;
ALTER TABLE t2 TABLESPACE=innodb_file_per_table;
DROP TABLE t1;
DROP TABLESPACE ts1;</code>

3.3 File‑Per‑Table Page Layout

An

.ibd

file starts with a 96 KB header (FSP_HDR) followed by pages of 16 KB (default). Pages include FSP_HDR, IBUF_BITMAP, INODE, and INDEX pages.

IBD file structure
IBD file structure

Index Page Analysis

Index pages contain a FIL Header, INDEX Header, FSEG Header, system records (infimum and supremum), and user records. Example hexdump of

t.ibd

shows checksum, LSN, page type, and space ID.

<code># hexdump -C t.ibd
0000c000  95 45 82 8a 00 00 00 03 ff ff ff ff ff ff ff ff  |.E..............|
0000c010  00 00 00 00 00 28 85 7c 45 bf 00 00 00 00 00 00  |.....(.|E.......|
...</code>

Page Directory

The page directory stores slot offsets for fast binary search within a page. Example slot table for table

t2

:

<code>slot    offset  type          owned  key
0       99      infimum       1
1       112     supremum      3</code>
Page directory structure
Page directory structure

4 InnoDB Transaction Isolation Levels

Read Uncommitted – dirty reads.

Read Committed – sees only committed data; may suffer non‑repeatable reads.

Repeatable Read – uses a consistent read view for the whole transaction; protects against non‑repeatable reads, uses next‑key locks to avoid phantom reads.

Serializable – lock‑based full serialization.

5 InnoDB and the ACID Model

Atomicity

Implemented via transaction support,

COMMIT

/

ROLLBACK

, and the undo log.

Consistency

Ensured by double‑write buffer, crash recovery, and isolation mechanisms.

Isolation

Provided by the various isolation levels and InnoDB's locking protocol.

Durability

Redo log.

Double‑write buffer (configurable via

innodb_doublewrite

).

innodb_flush_log_at_trx_commit

and

innodb_flush_log_at_timeout

for log flushing.

sync_binlog

for binary log durability.

OS‑level

fsync

calls.

References

https://blog.jcole.us/innodb/

https://dev.mysql.com/doc/refman/5.7/en/innodb-storage-engine.html

http://ourmysql.com/archives/1228

transactiondatabasestorage engineInnoDBMySQLACIDBuffer Pool
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.