MySQL 8.0 Redo Log Optimization: Design, Thread Model, and Tuning Parameters
This article explains how MySQL 8.0 redesigns the redo log with lock‑free mechanisms, describes the asynchronous thread architecture, details the mtr workflow and synchronization structures, and lists the new tunable parameters that improve write performance and checkpoint efficiency.
Introduction – The redo log was originally created to boost write performance while satisfying the Duration component of ACID. MySQL 8.0 implements a lock‑free redesign that removes the previous redo‑log bottleneck, resulting in a noticeable overall performance gain.
1. MySQL redo_log brief review – In MySQL 5.7 the write path is limited by two mutexes ( log_sys_t::mutex and log_sys_t::flush_order_mutex) that serialize access to the log buffer and the flush list, causing severe contention on multi‑CPU, fast‑storage systems.
2. Redo log optimization overview – The current design is fully asynchronous and lock‑free. Four asynchronous worker threads (log_writer, log_flusher, log_write_notifier, log_flush_notifier) plus auxiliary threads (log_checkpointer, log_closer) cooperate to move data from the log buffer to disk.
The auxiliary threads are:
log_writer – writes logs from the buffer to disk and advances write_lsn.
log_flusher – performs fsync and advances flushed_to_disk_lsn.
log_write_notifier – wakes user threads waiting for write_lsn (e.g., commit).
log_flush_notifier – wakes threads waiting for flushed_to_disk_lsn.
log_closer – cleans up LSN structures on normal shutdown and periodically removes stale entries.
log_checkpointer – periodically examines all flush lists, selects the oldest LSN, and updates the checkpoint LSN, moving checkpoint work off the main thread.
2.1 Thread‑synchronization data structures – Two lock‑free structures are used: atomic 64‑bit variables (C++11 atomics) and Link_buf, a circular array where each slot holds an LSN length. Slots are accessed with std::atomic_thread_fence to guarantee ordering without locks.
3. mtr workflow
The mtr (mini‑transaction) process uses two Link_buf instances ( recent_write and recent_close) to avoid contention. The steps are: log_buffer_reserve(*log_sys, len); Copy each block into the log buffer: m_impl->m_log.for_each_block(write_log); Advance recent_write to the end LSN, then check space in recent_close and possibly wait:
log_wait_for_space_in_log_recent_closed(*log_sys, handle.start_lsn);Add dirty pages to the flush list, write the LSN to recent_close, and finally close the log buffer:
add_dirty_blocks_to_flush_list(handle.start_lsn, handle.end_lsn);
log_buffer_close(*log_sys, handle);3.2 Analysis – The design eliminates the mutex contention of MySQL 5.7. Multiple mtr threads can write to the log buffer concurrently; recent_write provides a lock‑free tail‑advancing method that lets log_writer write a continuous range of LSNs to disk. recent_close guarantees that a checkpoint LSN is always sufficiently recent while allowing out‑of‑order flushes.
4. Redo log thread model
4.1 log_writer – Advances recent_write to the largest continuous LSN ( ready_lsn) and writes the range write_lsn … ready_lsn to the OS cache, then updates write_lsn.
/* Advance lsn up to which data is ready in log buffer. */
(void)log_advance_ready_for_write_lsn(log);
ready_lsn = log_buffer_ready_for_write_lsn(log); write_blocks(log, write_buf, write_size, real_offset); const lsn_t new_write_lsn = start_lsn + lsn_advance;
ut_a(new_write_lsn > log.write_lsn.load());
log.write_lsn.store(new_write_lsn);4.2 log_flusher – Performs fsync on the data written by log_writer and advances flush_up_to_lsn. Synchronization with log_writer occurs only via the write_lsn value, eliminating user‑space locks.
4.3 log_notifier – Consists of log_write_notifier (watches write_lsn) and log_flush_notifier (watches flush_up_to_lsn). They periodically poll and wake waiting user threads when the relevant LSN advances.
4.4 log_closer – (1) Periodically advances recent_close, freeing slots that have already been flushed, preventing mtr threads from stalling; (2) Performs cleanup on normal shutdown, moving any remaining redo log entries to the flush list.
4.5 New parameters – Two parameters analogous to innodb_log_writer_spin_delay and innodb_log_writer_timeout control the spin‑delay and timeout for the log_closer thread: innodb_log_closer_spin_delay and innodb_log_closer_timeout.
4.6 log_checkpointer – Monitors all flush lists, selects the smallest LSN, compares it with flushed_to_disk_lsn, and sets a new last_checkpoint_lsn. This moves checkpoint work from the main thread to a dedicated thread, reducing recovery time.
4.7 Parameters
Each log thread has its own spin‑delay and timeout variables (e.g., innodb_log_writer_spin_delay, innodb_log_writer_timeout). The function used for waiting is:
template <typename Condition>
inline static Wait_stats os_event_wait_for(os_event_t &event,
uint64_t spins_limit,
uint64_t timeout,
Condition condition = {})The wait algorithm first spins for spins_limit CPU pause instructions, then sleeps on an event for up to timeout microseconds, with exponential back‑off capped at 100 ms.
The table below lists the thread‑specific parameters:
Thread
innodb_log_xxx_spin_delay
innodb_log_xxx_timeout
log_writer
innodb_log_writer_spin_delay
innodb_log_writer_timeout
log_flusher
innodb_log_flusher_spin_delay
innodb_log_flusher_timeout
log_write_notifier
innodb_log_write_notifier_spin_delay
innodb_log_write_notifier_timeout
log_flush_notifier
innodb_log_flush_notifier_spin_delay
innodb_log_flush_notifier_timeout
log_closer
innodb_log_closer_spin_delay
innodb_log_closer_timeout
Additional CPU‑usage controls innodb_log_spin_cpu_abs_lwm and innodb_log_spin_cpu_pct_hwm determine when spin‑locks are allowed based on system load.
5. Conclusion – The lock‑free redesign of the redo log replaces the “wait‑for‑lock‑then‑write” model with a “write‑to‑buffer‑then‑monitor‑write‑position” model. User threads no longer block on disk I/O; background threads handle flushing, resulting in higher concurrency and better overall throughput.
Tencent Cloud provides database products such as CDB, TDSDB, CKV, and Mongo, and emphasizes enhanced database capabilities and stability for both internal and external users.
Tencent Database Technology
Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
