Databases 12 min read

How InnoDB Recovers After a Crash: Deep Dive into Redo, Binlog, and Undo Logs

After an unexpected crash, InnoDB restores data using a multi‑stage process that first replays redo logs based on checkpoints, then leverages binlog and undo logs to resolve uncommitted transactions, with detailed steps, optimizations, and checkpoint handling explained.

dbaplus Community
dbaplus Community
dbaplus Community
How InnoDB Recovers After a Crash: Deep Dive into Redo, Binlog, and Undo Logs

1. Redo‑log based recovery

When InnoDB starts after an unexpected crash it reads the most recent checkpoint stored in the first 2048 bytes of ib_logfile0. Two alternating checkpoints are kept; the newer one is identified by its checkpoint number. From the checkpoint the engine obtains the Log Sequence Number (LSN) and the offset inside the redo‑log file where recovery must begin.

checkpoint no : identifier of the newer checkpoint (two checkpoints alternate).

checkpoint lsn : LSN of the flush that created the checkpoint; all pages with LSN ≤ this value are guaranteed to be on disk.

checkpoint offset : byte offset in the redo‑log file where the recovery scan starts.

Checkpoint location diagram
Checkpoint location diagram

The redo‑log is scanned in three passes (MySQL 5.7 and later):

First pass locates the MLOG_CHECKPOINT record. If it is missing, no recovery is needed.

Second pass parses redo records and inserts them into a hash table recv_sys->addr_hash keyed by (space, offset). If the hash table fills before reaching end‑of‑file the third pass is skipped.

Third pass continues parsing until the hash table is full and all redo records have been applied.

During parsing each 512‑byte redo block is read in chunks of 4 × page_size (default page size = 16 KB → 64 KB per read). The relevant part of each block (the body) together with its (space, offset) key is stored in recv_sys->buf and then inserted into the hash table. Collisions are resolved with linked lists, allowing multiple bodies for the same key.

After the hash table is built, InnoDB iterates over it, reads the corresponding data pages from the tablespace files, and applies the redo operations, thereby persisting modifications that were only in the log.

Optimisation 1 : When a page is fetched into the buffer pool, InnoDB also pre‑fetches the 32 neighbouring pages, based on the assumption that nearby pages are likely to be needed soon.

Optimisation 2 : Prior to MySQL 5.7 the recovery process relied on the data dictionary to map space IDs to .ibd files, requiring all tablespaces to be opened. Starting with 5.7 the redo log contains two new record types— MLOG_FILE_NAME (stores space and file path) and MLOG_CHECKPOINT (marks the end of the file‑name list). This allows recovery to open only the tablespaces referenced in the redo log, eliminating the dictionary dependency. Multiple MLOG_CHECKPOINT records after a checkpoint indicate redo‑log corruption.

2. Binlog and undo‑log participation

The second recovery stage handles transactions that were written to the binary log but whose changes are not reflected in the redo log (e.g., a crash occurred after the binlog write but before the redo flush).

Read the latest binary log file and collect all transaction IDs (XIDs) that appear, building an xid_list.

Scan the undo logs to reconstruct the list of uncommitted transactions, producing an undo_list. InnoDB maintains 128 rollback segments; each segment points to undo‑log pages. By traversing the undo slots the engine builds trx_sys->trx_list, which contains all transactions that have not been committed.

Decision rule: if a transaction’s XID is present in the xid_list extracted from the binlog, the transaction must be committed; otherwise it is rolled back. This guarantees consistency between master and replica after recovery.

3. Potential further optimisations

After the hash table is populated, recovery of independent hash nodes could be parallelised because each node corresponds to a distinct (space, offset) key. Additionally, the pre‑fetch of 32 contiguous pages could be replaced by a red‑black tree ordered by (space, offset) , allowing the engine to read only the pages that are actually required.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

InnoDBmysqlBinlogundo logDatabase Internalsredo logRecovery
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.