Databases 25 min read

Deep Dive into MySQL InnoDB Tablespaces, Undo Logs, and Doublewrite Buffer

This article explores MySQL InnoDB's memory and on‑disk architecture, detailing tablespace types, table and row formats, primary key strategies, index structures, the doublewrite buffer, redo log, undo log, and temporary tablespaces, complete with diagrams and code examples for practical understanding.

Sanyou's Java Diary
Sanyou's Java Diary
Sanyou's Java Diary
Deep Dive into MySQL InnoDB Tablespaces, Undo Logs, and Doublewrite Buffer

InnoDB Architecture Overview

MySQL InnoDB's storage engine is renowned for high performance, reliability, and full transaction support.

The engine consists of two major parts: the In‑Memory Structure and the On‑Disk Structure .

Architecture diagram 1
Architecture diagram 1

Previously we covered the memory architecture; now we dive into the disk architecture, which integrates complex storage structures and mechanisms.

How does MySQL manage and store tables, indexes, rows, and temporary data?

The answer lies in MySQL's Tablespaces , which we will discuss in detail.

Tablespaces (表空间)

Tablespaces are the top‑level logical structure of InnoDB; all data resides in them.

Log files (Undo Log, Redo Log)

System tablespace file ibdata1 Undo tablespace

File‑Per‑Table tablespace

General tablespace

Temporary tablespace

We will introduce each type.

System Tablespace (System Tablespace)

By default it starts as a 10 MB file named ibdata1. It stores the Change Buffer and may also contain table and index data if tables are created there.

To enlarge it, set innodb_data_file_path with the autoextend attribute and restart the server. innodb_data_file_path=ibdata1:10M:autoextend Consider using independent or general tablespaces to avoid excessive growth.

File‑Per‑Table Tablespaces (独立表空间)

When the innodb_file_per_table option is enabled, each table gets its own .ibd file, allowing independent transaction management.

Internally, a tablespace is organized as segment → extent → page . A page (16 KB) is the basic storage unit.

Each index has two segments: one for leaf nodes and one for non‑leaf nodes.

Tablespace layout 2
Tablespace layout 2

General Tablespaces (通用表空间)

General tablespaces are shared InnoDB spaces created with CREATE TABLESPACE. They can store multiple tables and often use less memory than many independent tablespaces.

CREATE TABLESPACE tablespace_name
    [ADD DATAFILE 'file_name']
    [FILE_BLOCK_SIZE = value]
    [ENGINE [=] engine_name];

Limitations include inability to convert existing tablespaces, no support for temporary general tablespaces, and restrictions in replication environments.

Undo Tablespaces (撤销表空间)

Undo tablespaces store Undo Logs, which are essential for transaction rollback and MVCC.

In MySQL 5.6 and earlier, Undo Logs reside in the system tablespace ( ibdata1), causing the file to grow indefinitely. MySQL 5.7+ introduces independent Undo tablespaces, reducing pressure on the system tablespace.

MySQL 8.0 creates two default Undo tablespace files ( undo_001 and undo_002), each 16 MB, configurable via innodb_undo_tablespaces.

Undo tablespace layout
Undo tablespace layout

Each Undo tablespace contains 128 rollback segments, each managing 1 024 Undo segments composed of 16 KB pages.

Temporary Tablespaces (临时表空间)

There are two kinds: Session Temporary Tablespaces for user‑created temporary tables and optimizer‑generated internal tables, and Global Temporary Tablespace for storing rollback segments of temporary tables.

Session temporary tablespaces are allocated from a pool of .ibt files (default 10 files) under #innodb_temp. The global temporary tablespace uses the file ibtmp1 (default 12 MB) and can auto‑extend.

Temporary tablespace physical structure
Temporary tablespace physical structure

Tables (表)

How to create a table in MySQL?

Use CREATE TABLE with the InnoDB engine:

CREATE TABLE t1 (a INT, b CHAR(20), PRIMARY KEY (a)) ENGINE=InnoDB;

By default, InnoDB creates each table in its own tablespace. To force creation in the system tablespace, disable innodb_file_per_table before creating the table.

Row Formats (行格式)

InnoDB supports four row formats: REDUNDANT, COMPACT, DYNAMIC (default), and COMPRESSED. The latter two allow index key prefixes up to 3072 bytes, while the first two are limited to 767 bytes.

Primary Keys (主键)

Every table should have a primary key. Good primary keys are non‑null, unique, stable, and frequently used in queries. Often a synthetic auto‑increment integer column is used.

Define an auto‑increment column with AUTO_INCREMENT and control its lock behavior via innodb_autoinc_lock_mode (0 = traditional, 1 = consecutive, 2 = interleaved).

Indexes (索引)

InnoDB indexes are B+ trees and come in two types:

Clustered Index : the table’s primary key; leaf nodes store the full row.

Secondary Index : stores the indexed columns plus the primary key value; leaf nodes point to the clustered index.

Clustered index B+ tree
Clustered index B+ tree

Short primary keys reduce secondary index size.

Covering Index Optimization

-- Example table
CREATE TABLE users (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  age INT,
  INDEX idx_name_age (name, age)
);

-- Covering index query
SELECT id, name, age FROM users WHERE name = 'Alice';

If all selected columns are in the secondary index, MySQL can satisfy the query using only the index ( Using index in the execution plan), avoiding a table lookup.

Doublewrite Buffer (双写缓冲)

InnoDB pages are 16 KB, while the OS page size is typically 4 KB, requiring four OS writes per page. A crash during these writes can corrupt data, which redo logs cannot fix because they record only changes.

The doublewrite buffer mitigates this by first writing pages to a shared tablespace (the doublewrite buffer) and then to the final .ibd file. If a crash occurs, the buffer provides a clean copy for recovery.

Doublewrite buffer architecture
Doublewrite buffer architecture

The buffer holds 128 pages (2 MB) in memory. Dirty pages are copied to the buffer, flushed to the shared tablespace in sequential writes, and then written to their final locations.

Redo Log (重做日志)

Redo logs ensure durability by recording physical page changes in a sequential log before the pages are written to disk (WAL). The log buffer in memory collects changes and flushes them to the redo log files.

/* Simplified log buffer write */
void log_buffer_write(log_t &log, byte* record, size_t len) {
    mutex_enter(&log.mutex);
    lsn_t start_lsn = log.assign_lsn(len);
    memcpy(log.buf + write_offset, record, len);
    log.update_write_lsn(start_lsn + len);
    os_event_set(log.flusher_event);
}

During recovery, InnoDB replays redo logs to bring the data files up to date.

Undo Logs (撤销日志)

Undo logs provide atomicity and MVCC. They store the before‑image of modified rows, enabling transaction rollback and consistent reads.

Each Undo tablespace contains 128 rollback segments, each managing 1 024 Undo segments composed of 16 KB pages.

Undo log generation flow
Undo log generation flow

The purge thread asynchronously removes obsolete Undo records after no active read view references them.

Design Takeaways

Layered decoupling: Redo handles physical persistence, Undo handles logical rollback, and binlog handles replication.

Space‑for‑time trade‑off: Keeping history in Undo enables lock‑free reads.

Deferred cleanup: The purge thread avoids synchronous cleanup overhead during commit.

InnoDBMySQLIndexesundo logredo logTablespacesDoublewrite Buffer
Sanyou's Java Diary
Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.