Why MySQL Locks and Transaction Isolation Cause Data Loss in Concurrent Syncs
A detailed investigation reveals that MySQL's lock matrix and MVCC‑based transaction isolation caused a concurrent synchronization job to lose metric and dimension data, and proposes a distributed‑lock solution together with a long‑term read‑calc‑write refactor to eliminate the issue.
Incident Overview
On 2025‑08‑13 a logical fact table attempted to synchronize its metrics and dimensions to an atomic service. Two nearly simultaneous sync requests caused part of the data to be missing.
Root Cause
Both requests executed the same "delete‑then‑insert" workflow. The second request’s DELETE was blocked by the first request’s transaction lock, but the second request’s subsequent SELECT read a snapshot taken before the first transaction’s delete became visible, so it incorrectly assumed the data was already present.
MySQL InnoDB Lock Mechanism
InnoDB uses a lock matrix rather than a single lock type:
Table‑level shared ( LOCK TABLES … READ) and exclusive ( LOCK TABLES … WRITE) locks.
Row‑level shared lock ( SELECT … FOR SHARE) and exclusive lock ( UPDATE/DELETE/INSERT).
Gap locks that block inserts into a range.
Next‑Key locks (row lock + gap lock) – the default for InnoDB.
The delete statement DELETE FROM unify_metric_impl WHERE logic_table_id IN (…) does not satisfy the left‑most index rule on idx_metric_def_id (metric_def_id, logic_table_id). Consequently MySQL escalates to a table‑level exclusive lock, forcing the second transaction to wait until the first releases the lock.
Transaction Isolation and MVCC
InnoDB’s default isolation level is REPEATABLE READ , which prevents dirty and non‑repeatable reads; phantom reads are avoided by Next‑Key locks. Each row stores three hidden fields ( DB_TRX_ID, DB_ROLL_PTR, DB_DELETED) and a read view ( m_ids, min_trx_id, max_trx_id, creator_trx_id) determines visibility. Snapshot reads see rows whose trx_id ≤ min_trx_id and not listed in m_ids. Redo logs guarantee durability; undo logs enable rollback and MVCC reads.
Practical Code Analysis
@Transactional(rollbackFor = Exception.class)
public Map<String, Object> driveToAtomService(Map logicTableData, String erp) {
// obtain environment
String env = driveLogicTable.getString(DRIVE_LOGIC_TABLE_ENV);
// get or add logic table id
Long logicTableId = getOrAddLogicTableId(...);
// delete old metric implementations (may acquire table‑level X lock)
metricImplMapper.deleteByLogicTableIds(Collections.singletonList(logicId));
// fetch current metric list and existing implementations
List<MetricImplBO> metricList = getMetricImpls(...);
List<MetricImplRelBO> metricImpls = metricImplMapper.getMetricImpls(logicTableId);
Set<Long> metricDefIdSet = metricImpls.stream()
.map(MetricImplRelBO::getMetricDefId).collect(Collectors.toSet());
List<MetricImplBO> addList = metricList.stream()
.filter(s -> !metricDefIdSet.contains(s.getMetricDefId()))
.collect(Collectors.toList());
// insert only new metric implementations
addMetricImpl(addList);
}When the first transaction runs, the delete acquires a table‑level X lock. The second transaction’s SELECT sees the pre‑delete version because its snapshot was taken before the delete became visible, leading to the apparent "non‑empty" result.
Solution Options
Distributed lock per logical table – low implementation effort, resolves the immediate conflict but retains long‑running transactions.
Force current read (SELECT … FOR UPDATE) – also low effort but creates long‑lasting locks; not recommended for high‑concurrency workloads.
Split the workflow into "read‑calc‑write" – read data without locks, compute differences in the application layer, then write only the delta in a short transaction. This requires refactoring but eliminates the root cause and reduces lock contention.
Current Status
The short‑term distributed‑lock approach has been deployed and the incident is resolved. Future iterations will migrate to the read‑calc‑write pattern to shrink transaction size and improve concurrency.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
