Understanding the InnoDB Lock Module Structure
This article explains the structure and components of InnoDB's lock module, including the global lock_sys object, its hash tables, latch mechanisms, waiting thread slots, and related attributes, with detailed code examples and diagrams to illustrate how row and table locks are managed.
1. Introduction
In the previous three articles we introduced InnoDB table locks, row locks, and the structures that store them. The table‑lock and row‑lock structures form the foundation of the lock module, much like bricks in a building.
However, just as a building needs steel and cement, the lock module provides additional components that manage these structures. The lock module consists of a single globally unique object lock_sys .
2. Lock Module Structure
The lock module type is lock_sys_t . After removing comments and two irrelevant fields, the simplified definition is:
struct lock_sys_t {
locksys::Latches latches;
hash_table_t *rec_hash;
hash_table_t *prdt_hash;
hash_table_t *prdt_page_hash;
Lock_mutex wait_mutex;
srv_slot_t *waiting_threads;
srv_slot_t *last_slot;
bool rollback_complete;
std::chrono::steady_clock::duration n_lock_max_wait_time;
os_event_t timeout_event;
}Although the structure has few fields, the complexity of locking arises from the interactions of many transactions and lock scenarios, leading to wait chains and possible deadlocks.
To avoid diving into that complexity, we first describe each attribute of the lock module.
There are three hash_table_t attributes: rec_hash , prdt_hash , and prdt_page_hash . The latter two are used by predicate locks and are omitted from this discussion.
The n_lock_max_wait_time attribute records the longest row‑lock wait time since MySQL started. It can be queried with:
show status like 'innodb_row_lock_time_max';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| Innodb_row_lock_time_max | 50157 |
+--------------------------+-------+The rollback_complete flag indicates whether all transactions recovered from the undo log have been fully rolled back during startup. If false, InnoDB traverses the trx_sys->rw_trx_list to release their locks.
2.1 Who Manages Row‑Lock Structures?
Row‑lock structures are stored in the rec_hash hash table. When a lock_sys object is created, memory for rec_hash is allocated:
void lock_sys_create(ulint n_cells) {
...
lock_sys = static_cast
(ut::zalloc_withkey(...));
lock_sys->rec_hash = ut::new_
(n_cells);
...
}The size of n_cells is derived from the buffer‑pool size. For a 128 MiB buffer pool and a 16 KiB page size, srv_lock_table_size becomes 40 960, meaning rec_hash contains 40 960 slots.
Each slot does not store a row‑lock structure; instead, it acts as a “head” that points to a linked list of row‑lock structures sharing the same hash value.
2.2 Who Protects Table‑Lock and Row‑Lock Structures?
Concurrent transactions may read or write the same hash‑slot list or a table’s lock list. To serialize these accesses, the lock module provides the latches attribute of type locksys::Latches :
class Latches {
private:
...
Unique_sharded_rw_lock global_latch;
Page_shards page_shards;
Table_shards table_shards;
...
}global_latch protects operations on waiting_threads and last_slot . page_shards protects the hash‑slot lists used for row‑locks, while table_shards protects the lock lists attached to each table.
Both Page_shards and Table_shards contain an array of 512 mutexes:
static constexpr size_t SHARDS_COUNT = 512;
class Page_shards { Padded_mutex mutexes[SHARDS_COUNT]; };
class Table_shards { Padded_mutex mutexes[SHARDS_COUNT]; };When a transaction needs to insert or search a row‑lock list, it hashes the page number and tablespace ID to obtain an index, then locks the corresponding mutex in the array.
For table‑level locks, the table ID modulo 512 selects the appropriate mutex.
2.3 What Happens When a Lock Wait Occurs?
The lock module contains three fields related to waiting: wait_mutex , waiting_threads , and last_slot . Their initialization looks like:
void lock_sys_create(ulint n_cells) {
ulint lock_sys_sz = sizeof(*lock_sys) + srv_max_n_threads * sizeof(srv_slot_t);
...
lock_sys->waiting_threads = static_cast
(ptr);
lock_sys->last_slot = lock_sys->waiting_threads;
mutex_create(LATCH_ID_LOCK_SYS_WAIT, &lock_sys->wait_mutex);
...
}waiting_threads points to a contiguous memory region that can hold srv_max_n_threads (hard‑coded to 102 400) srv_slot_t objects. Each slot represents a transaction that is currently waiting for a lock.
last_slot always points to the first free slot after the last used one, allowing the background thread that checks for lock‑wait timeouts to scan only the occupied portion of the array.
The wait_mutex protects concurrent updates to last_slot and the waiting‑slot array.
2.4 Sending a Lock‑Wait Notification
When a transaction enters a wait state, it triggers an event stored in the timeout_event attribute. A background thread monitors this event; upon notification it checks for deadlocks and resolves them if necessary.
3. Summary
The rec_hash attribute is a hash table that partitions row‑lock structures into many slots, each managing a linked list of locks.
The latches attribute ensures that only one thread at a time can read or write a particular hash‑slot list or a table’s lock list.
waiting_threads points to a memory area divided into 102 400 slots, each occupied by a transaction that is waiting for a lock.
last_slot marks the boundary between used and free slots, reducing the amount of work needed to scan for lock‑wait timeouts.
wait_mutex serializes access to last_slot and the waiting‑slot array.
timeout_event notifies the background thread when a new lock wait appears, enabling timely deadlock detection.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.