Databases 18 min read

How MySQL 5.7 Optimizes Metadata Locks and Hash Scan for Faster Replication

This article details the evolution of MySQL's Metadata Lock subsystem, the implementation of lock‑free hash tables and fast‑path locking in MySQL 5.7, the hash_scan algorithm used in replication, as well as performance enhancements in TokuDB 7.5.0 and MariaDB's small‑LIMIT filesort optimization.

21CTO
21CTO
21CTO
How MySQL 5.7 Optimizes Metadata Locks and Hash Scan for Faster Replication

MySQL 5.7 Optimization of Metadata Lock Subsystem

Background

The MDL lock was introduced to fix bug #989, which caused replication breakage when a DROP TABLE statement was logged before a transaction commit. In MySQL 5.5, MDL locks are held for the whole transaction, preventing the DROP until the commit.

Session 1: BEGIN;

Session 1: INSERT INTO t1 VALUES (1);

Session 2: DROP TABLE t1; -- writes to binlog

Session 1: COMMIT; -- writes to binlog

During binlog replay on a replica, the DROP TABLE is executed before the INSERT, causing replication interruption.

MDL also introduced a new hotspot: all MDL lock objects are stored in a single hash. To mitigate contention, MySQL 5.6 added the metadata_locks_hash_instances parameter to partition the hash, but the original hash function caused many keys to map to the same bucket.

Subsequent analysis revealed that the MurmurHash3 algorithm (introduced in MySQL 5.6.15) fixed the hash‑key calculation issue.

MySQL 5.7 Optimizations for MDL

1. Lock‑free hash for MDL objects – The original partitioned hash was replaced by a lock‑free hash (LF_HASH) based on the "Split‑Ordered Lists" algorithm, removing the need for hash partitioning.

2. Fast‑path for DML/SELECT – A lock‑word fast‑path is used for common DML/SELECT operations; if it fails, the slower path is taken. Each MDL lock object maintains a m_fast_path_state flag to indicate its status.

Session 1: BEGIN; Session 1: SELECT FROM sbtest1 WHERE id =1; // fast‑path state = 1048576 Session 2: BEGIN; Session 2: SELECT FROM sbtest1 WHERE id =2; // fast‑path state = 2097152 (fast‑path) Session 3: ALTER TABLE sbtest1 ENGINE=INNODB; // DDL forces slow‑path Session 4: SELECT * FROM sbtest1 WHERE id =3; // checks for obtrusive lock

The MDL subsystem distinguishes between OBTRUSIVE and UNOBTRUSIVE lock types, allowing DML/SELECT to acquire locks with virtually no mutex overhead.

3. Reducing redundant THR_LOCK usage – With MDL in place, the older THR_LOCK mechanism in InnoDB becomes largely unnecessary, requiring only minimal code changes.

4. User locks via GET_LOCK – User‑level locks now use MDL, enabling dead‑lock detection and imposing a 64‑byte name limit.

Hash Scan Implementation in MySQL Replication

Problem description : As the size of table t1 grows, the test rpl_hash_scan.test shows quadratic execution time because each row deletion triggers a full table scan on the replica.

--source include/master-slave.inc
--source include/have_binlog_format_row.inc
connection slave;
set global slave_rows_search_algorithms='TABLE_SCAN';
connection master;
create table t1(id int, name varchar(20);
insert into t1 values(1,'a');
... 
insert into t3 values(1000,'xxx');
delete from t1;
---source include/rpl_end.inc

When slave_rows_search_algorithms is set to HASH_SCAN (available since MySQL 5.6.6), the server caches changed rows in two structures: m_hash (a hash table of row positions) and m_distinct_key_list (index keys, if any).

During Rows_log_event::do_hash_scan_and_update, the event first populates m_hash; if the table has an index, keys are also stored in m_distinct_key_list. The subsequent scan iterates over m_hash (O(1) look‑ups) and applies each row, resulting in only a single full‑table scan regardless of the number of rows.

Rows_log_event::do_apply_event
  Rows_log_event::do_hash_scan_and_update
    Rows_log_event::do_hash_row (add entry info of changed records)
      if (m_key_index < MAX_KEY) use index instead of table scan
        Rows_log_event::add_key_to_distinct_keyset()

A known bug (MySQL bug #72788) caused duplicate deletions because m_distinct_key_list could contain non‑unique keys; it was fixed in revision 8494.

Bug details: http://bugs.mysql.com/bug.php?id=72788

Fix: http://bazaar.launchpad.net/~mysql/mysql-server/5.7/revision/8494

Open questions include the effectiveness of hash_scan on tables without indexes, the maximum number of rows per event (limited to ~9 KB), and the fact that hash_scan only applies to UPDATE/DELETE events, not statement‑based queries.

TokuDB 7.5.0 Version Optimizations

a) Faster shutdown – Parallel compression of internal nodes reduces shutdown time, especially when tokudb_cache_size is large.

b) Internal node read acceleration – Each internal node now persists the sorted OMT offset to disk, eliminating the costly sort step during cache‑miss reconstruction.

c) Sequential write acceleration – A heuristic seqinsert_score detects sequential writes; when the score exceeds 100, writes are directed to the right‑most node, avoiding extra index look‑ups.

MariaDB Filesort with Small LIMIT Optimization

Since MySQL 5.6.2 / MariaDB 10.0.0, the optimizer uses a priority‑queue (heap) of size n for queries of the form ORDER BY … LIMIT n when n is small. This reduces the sorting complexity from m·log(m) to m·log(n), where m is the number of rows after filtering.

Create a heap of n elements.

Iterate over rows, inserting each into the heap.

If a row's sort key is smaller than the heap root, replace the root and re‑heapify.

Continue until all rows are processed.

Extract heap elements in reverse order to produce the final sorted result.

MariaDB 10.0.13 exposes the counter Sort_priority_queue_sorts (global and session) to monitor usage, and the status appears in the slow‑query log when log_slow_verbosity=query_plan is enabled.

# Time: 140714 18:30:39
# User@Host: root[root] @ localhost []
# Thread_id: 3  Schema: test  QC_hit: No
# Query_time: 0.053857  Lock_time: 0.000188  Rows_sent: 11  Rows_examined: 100011
# Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
# Filesort: Yes  Filesort_on_disk: No  Merge_passes: 0  Priority_queue: Yes
SET timestamp=1405348239;SET timestamp=1405348239;
select * from t1 where col1 between 10 and 20 order by col2 limit 100;

The "Priority_queue: Yes" flag indicates that the query benefited from the small‑LIMIT optimization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlDatabase Internalsmetadata lockHash Scan
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.