Databases 12 min read

Writeset‑Based Replication in MySQL: Background, Principles, Source‑Code Analysis, Testing, and Summary

This article explains how MySQL’s logical‑clock replication can be enhanced with writeset‑based dependency tracking, detailing the underlying theory, code implementation, configuration options, performance testing, and practical considerations for improving parallel applier concurrency.

Tencent Database Technology

Sep 14, 2021

Writeset‑Based Replication in MySQL: Background, Principles, Source‑Code Analysis, Testing, and Summary

After MySQL added logical‑clock replication, master‑to‑slave lag improved, but the parallelism on the slave still depended on the master’s transaction concurrency; low concurrency limited parallel replay even when transactions did not conflict.

To address this, MySQL 8.0.1 introduced writeset‑based replication, which refines transaction dependencies using the actual rows modified by each transaction.

A logical clock consists of a commit‑parent and a sequence number. The writeset mechanism examines the hash of each modified row; if two transactions modify the same row, the later transaction’s commit‑parent is adjusted so that their logical‑clock intervals may overlap, enabling parallel replay.

The conflict‑detection algorithm maintains a global std::map that maps row‑hash values to the sequence number of the last transaction that modified the row. When a transaction commits, its writeset is scanned; matching hashes indicate a conflict, and the maximum conflicting sequence number becomes the new commit‑parent (or the minimum sequence number if no conflict).

Example visualizations (omitted) show how writeset changes the replay order from

<T1,T2>, <T3>, <T4,T5>, <T6>, <T7,T8>

to <T1,T2,T3>, <T4,T5,T6,T7>, <T8>, increasing parallelism.

Two configuration knobs can preserve original commit order when needed: setting slave_preserve_commit_order=ON or using the writeset_session mode, which prevents concurrent execution of transactions from the same client session.

In MySQL 5.7.6 the transaction writeset is built during binlog writing (functions binlog_log_row and add_pke). The process hashes primary‑key and unique‑key values of modified rows, records foreign‑key information, and flags transactions that lack hashable keys.

Dependency calculation occurs in MYSQL_BIN_LOG::write_transaction, which calls Transaction_dependency_tracker::get_dependency. The tracker selects a strategy based on binlog_transaction_dependency_tracking:

void Transaction_dependency_tracker::get_dependency(THD *thd,
                                                    int64 &sequence_number,
                                                    int64 &commit_parent) {
  sequence_number = commit_parent = 0;
  switch (m_opt_tracking_mode) {
    case DEPENDENCY_TRACKING_COMMIT_ORDER:
      m_commit_order.get_dependency(thd, sequence_number, commit_parent);
      break;
    case DEPENDENCY_TRACKING_WRITESET:
      m_commit_order.get_dependency(thd, sequence_number, commit_parent);
      m_writeset.get_dependency(thd, sequence_number, commit_parent);
      break;
    case DEPENDENCY_TRACKING_WRITESET_SESSION:
      m_commit_order.get_dependency(thd, sequence_number, commit_parent);
      m_writeset.get_dependency(thd, sequence_number, commit_parent);
      m_writeset_session.get_dependency(thd, sequence_number, commit_parent);
      break;
    default:
      assert(0);
      m_commit_order.get_dependency(thd, sequence_number, commit_parent);
  }
}

The writeset tracker ( Writeset_trx_dependency_tracker::get_dependency) iterates over the transaction’s hash values, updates a history map, respects capacity limits, and computes the final commit_parent based on the smallest conflicting sequence number.

void Writeset_trx_dependency_tracker::get_dependency(THD *thd,
                                                     int64 &sequence_number,
                                                     int64 &commit_parent) {
  Rpl_transaction_write_set_ctx *write_set_ctx =
      thd->get_transaction()->get_transaction_write_set_ctx();
  std::vector<uint64> *writeset = write_set_ctx->get_write_set();
  bool can_use_writesets = (writeset->size() != 0 || write_set_ctx->get_has_missing_keys() ||
       is_empty_transaction_in_binlog_cache(thd)) &&
      (global_system_variables.transaction_write_set_extraction ==
       thd->variables.transaction_write_set_extraction) &&
      !write_set_ctx->get_has_related_foreign_keys() &&
      !write_set_ctx->was_write_set_limit_reached();
  // ... (history handling omitted for brevity) ...
}

Writeset usage has several constraints: it cannot be applied to DDL statements, the session’s hash algorithm must match the history’s algorithm, modified columns must not be referenced by other tables, the history size must stay within binlog_transaction_dependency_history_size, and tables without primary or unique keys fall back to commit‑order tracking.

The writeset_session tracker further restricts transactions from the same session:

void Writeset_session_trx_dependency_tracker::get_dependency(
    THD *thd, int64 &sequence_number, int64 &commit_parent) {
  int64 session_parent = thd->rpl_thd_ctx.dependency_tracker_ctx()
                             .get_last_session_sequence_number();
  if (session_parent != 0 && session_parent < sequence_number)
    commit_parent = std::max(commit_parent, session_parent);
  thd->rpl_thd_ctx.dependency_tracker_ctx().set_last_session_sequence_number(
      sequence_number);
}

Performance testing with sysbench (10 tables, 100 k rows each, 300 s read‑write workload) showed that writeset‑based replication significantly outperforms pure commit‑order replication, especially when master concurrency is low; writeset_session performs slightly slower than pure writeset but still better than commit‑order.

In summary, writeset refines transaction dependencies on top of logical clocks, increasing slave replay parallelism for workloads with limited master concurrency, while incurring extra memory and CPU overhead that may be undesirable on highly concurrent or resource‑constrained instances.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql Binlog Replication parallel applier Transaction Dependency WriteSet

Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.