Databases 25 min read

Physical Replication in MySQL: Architecture, Log Management, and Performance Evaluation

This article explains the design and implementation of InnoDB‑based physical replication for MySQL, covering background concepts, advantages over native binlog replication, high‑level architecture, log file handling, instance roles, MVCC handling, failover procedures, and performance test results.

Architect

May 19, 2016

Background Knowledge

Before diving into physical replication, a solid understanding of InnoDB transaction internals is required, including Transaction ID (auto‑incremented identifier for each read‑write transaction), Read View (snapshot for consistent reads), Redo Log (records physical file changes for crash recovery), Mini Transaction (mtr, the smallest atomic operation that generates redo), LSN (log sequence number that monotonically increases), Undo Log (stores previous row versions for MVCC), and Binary Log (logical log used for traditional MySQL replication).

Pros and Cons of Native Replication

Native MySQL replication writes both redo log and binary log for each transaction, providing readable binlog files, flexible topology, and support for heterogeneous replicas, but it incurs double‑fsync overhead, higher I/O pressure, and replication latency due to the need to ship binlog entries to slaves.

Why Physical Replication

Physical replication eliminates the need for binlog and GTID, reducing disk writes to a single fsync per transaction, which dramatically improves throughput and latency; it also allows concurrent apply of redo logs on the slave, enabling page‑level parallelism and stronger data consistency, though it is limited to InnoDB and cannot support multi‑master topologies.

High Level Architecture

The architecture mirrors native replication but uses an independent code path. A slave starts an IO thread after executing START INNODB SLAVE, which requests a dump containing master_uuid and start_lsn. The master’s log_dump thread streams ib_logfile data, which the slave copies into its InnoDB log buffer, writes to local ib_logfile, and then a Log Apply coordinator parses and distributes logs to worker threads based on (space_id, page_no) % (n_workers+1). System tables are applied before user tables to ensure undo logs are processed first.

Log File Management

Unlike the circular ib_logfile scheme, physical replication retains old log files for reliability and backup. A new log file is created when the current one fills, with a background allocator preparing the next file and a purge thread renaming obsolete files with a purged prefix or deleting them when the pool is full. An additional ib_checkpoint file stores checkpoint information separately from ib_logfile0.

Instance Roles

Instances are classified as master, slave, or upgradable‑slave (intermediate state during failover). Role information is persisted in innodb_repl.info along with the source instance UUID. The role determines whether the instance can accept writes and how dump requests are validated.

Background Threads

On a slave, certain background threads that could modify data are disabled: the purge thread, master‑side ibuf merge, and dict_stats persistence. The page‑cleaner algorithm is also tuned to minimize impact on log apply.

MySQL Server Layer Data Replication

File Operation Replication

Metadata files (FRM, PAR, DB.OPT, etc.) are replicated by logging three new log types: MLOG_METAFILE_CREATE: [FIL_NAME | CONTENT], MLOG_METAFILE_RENAME: [ORIGINAL_NAME | TARGET_NAME], and MLOG_METAFILE_DELETE: [FIL_NAME]. The server layer always creates new files and deletes old ones rather than modifying files in place.

DDL Replication

DDL changes are wrapped between MLOG_METACHANGE_START and MLOG_METACHANGE_END logs, with additional logs for file creation, rename, and delete. To handle crashes that may lose the end marker, the master writes a special log after crash recovery to release any held MDL locks on the slave.

Slave MVCC

View Control

Two new log types, MLOG_TRX_START (records assigned transaction ID) and MLOG_TRX_COMMIT (records commit), allow the slave to reconstruct transaction visibility. After applying a batch of logs, the slave updates trx_sys->max_trx_id and populates the active transaction array.

Purge Control

Two approaches are offered: (1) the master writes the oldest safe purge snapshot to redo, and the slave waits for all active views to finish before purging; (2) the slave periodically reports its safe purge point to the master, which then limits its own purge progress. Both have trade‑offs between replication lag and master purge efficiency.

B‑Tree Structure Change Replication

When a page‑level B‑tree change spans multiple pages, the master logs the involved index IDs. The slave acquires an exclusive lock on the index, applies the logs, then releases the lock to ensure consistency.

Change Buffer Replication

Slave Change Buffer Merge

To keep a read‑only slave from modifying data, the slave creates a shadow page before applying a change‑buffer entry, applies the merge without generating redo, and discards the shadow page once the page is evicted or applied.

Replication Change Buffer Merge

New log types MLOG_IBUF_MERGE_START and MLOG_IBUF_MERGE_END coordinate merge operations on the slave, ensuring that necessary locks are held and shadow pages are managed correctly.

Failover

Planned Failover

Four steps are executed: demote master to upgradable‑slave with MLOG_DEMOTE, slaves update their role, promote a chosen replica with MLOG_PROMOTE, and finally slaves switch to normal slave role.

Unplanned Failover

In an unexpected failover, the old master is turned into a slave, the new master’s LSN is captured, and missing redo pages are fetched from the new master and written to the old master via overwrite, followed by truncating excess redo and resuming replication.

Testing and Performance

Three configurations were benchmarked using Sysbench on a 16 GB buffer pool: ALI_RDS_56_redo (physical replication, binlog disabled), ALI_RDS_56 (standard RDS), and upstream MySQL 5.6.29. Results for TPS and response time on update‑non‑index workloads are shown in the accompanying images.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Database Architecture InnoDB mysql Physical Replication

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.