Databases 7 min read

How LCL+ Boosts Distributed Database Performance with One‑Victim Deadlock Resolution

The OceanBase team’s LCL+ algorithm, a lock‑chain‑length based distributed deadlock detection and elimination method, eliminates the need for a global wait‑for graph, achieves single‑victim unlocks, and dramatically improves throughput while reducing tail latency in production database clusters.

AntTech
AntTech
AntTech
How LCL+ Boosts Distributed Database Performance with One‑Victim Deadlock Resolution

Paper Announcement

The OceanBase team’s paper titled “A Pure Distributed Deadlock Detection and Elimination Solution Based on ‘Lock Chain Length’ One‑Way Propagation” was accepted by ACM Transactions on Computer Systems (ACM TOCS, 2025), a leading journal for computer systems, distributed systems, and operating systems research.

Background

Deadlock detection and elimination are mature in classic centralized databases but remain nascent in distributed environments, where each node only has a local view and constructing or maintaining a global wait‑for graph (WFG) incurs high cost and increased tail latency.

Limitations of Existing Algorithms

The early M&M (Mitchell‑Merritt) algorithm assumes each transaction waits for only one resource at a time, which is unrealistic for modern workloads that may wait on multiple resources concurrently. The CMH (Chandy‑Misra‑Haas) algorithm, while decentralized, can trigger “multiple‑victim” rollbacks, leading to high engineering overhead and reduced throughput.

LCL+ Solution

To address these challenges, OceanBase introduced LCL+, a set of distributed algorithms characterized by “no global view, lightweight messages, and single‑victim unlock”. It accelerates detection for mixed deadlocks (local + distributed) and has been implemented in the OceanBase production system, showing significant improvements in throughput, tail latency, and scalability.

Technical Approach

Core Idea: Lock Chain Length (LCL) – LCL uses a propagatable local metric, the lock‑chain length value (LCLV), to decompose the global deadlock detection problem into lightweight local interactions.

Each waiting node maintains an integer LCLV that propagates unidirectionally along dependency edges. In a cycle, LCLVs continuously increase; on acyclic paths they converge to an upper bound. This property enables pure‑distributed detection and victim selection using only downstream messages.

Algorithm Stages

Stage 0 – Pre‑proliferation: A fast DFS on local wait chains identifies small local cycles instantly and selects a single victim, while cross‑machine parts are buffered for the periodic LCL process, accelerating mixed deadlock detection.

Stage 1 – Proliferation: LCLVs are incrementally propagated downstream, causing values on cyclic paths to rise and on acyclic paths to converge, distinguishing cycles from non‑cycles.

Stage 2 – Spread: The maximum LCLV within a cycle is synchronized, and a predefined termination priority (e.g., timestamp or transaction priority) selects a unique victim. The victim’s ID is then disseminated to all nodes in the cycle.

Stage 3 – Detection: When a node satisfies the three‑equation condition—being in the same cycle and identified as the victim—it terminates only that transaction, breaking the deadlock with a single victim.

Performance Results

In a six‑node OceanBase cluster, LCL+ outperformed the conventional timeout‑based baseline, increasing system throughput by 150%–540% across various load distributions. When combined with timeout, peak throughput improved up to 820%. LCL+ also achieved markedly lower 99%/99.9% tail latency compared to timeout‑only and traditional distributed detection schemes such as CMH, eliminating the overhead of multi‑victim rollbacks.

Conclusion

By replacing expensive global convergence with a unidirectional, locally measurable LCLV, LCL+ transforms a global deadlock detection challenge into lightweight local interactions. The added Stage 0 pre‑proliferation further accelerates mixed deadlock detection, delivering substantial throughput gains, reduced tail latency, strict single‑victim unlocks, and easier incremental deployment in existing systems.

OceanBaseDeadlock Detectiondatabase algorithms
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.