How Adaptive K‑Value Backoff Locks Boost RocketMQ Performance by Up to 38%
A recent CCF‑A conference paper reveals that an adaptive K‑value backoff lock, derived from queueing theory and implemented in Apache RocketMQ, can replace both spin and mutex locks, achieving up to 37.58% performance gains on x86 CPUs and 32.82% on ARM while reducing CPU usage and resource consumption.
Paper Overview
A paper titled Beyond the Bottleneck: Enhancing High‑Concurrency Systems with Lock Tuning was accepted to the CCF‑A level FM 2024 conference. The authors (Ji Juntao, Gu Yinyou, Fu Yubao, Lin Qingshan) present a lock‑tuning technique originally motivated by performance optimization of RocketMQ on Alibaba Cloud CPUs.
Problem Statement
RocketMQ historically employed two types of locks during message sending: a spin lock and a mutex lock. Different CPUs exhibit distinct optimal lock behaviours; a mismatched lock can cause severe performance degradation and unnecessary resource consumption.
Proposed Adaptive K‑Value Backoff Lock
The authors model spin‑lock behaviour using queueing theory, establishing a relationship between the spin count K and system load P. The expected lock acquisition time consists of two components: T_s: expected time spent spinning T_c: expected time spent in a context switch
By substituting the expressions for T_s and T_c as functions of K and P, they derive a formula for the overall expected lock time (image shown below).
The adaptive lock works as follows: after K spin attempts without acquiring the lock, the thread invokes Thread.yield(), handing the CPU back to the operating system. This strategy avoids wasteful spinning in low‑contention scenarios and eliminates unnecessary context switches in high‑contention cases.
Experimental Evaluation
Tests were conducted on both x86 and ARM CPUs using Apache RocketMQ with synchronous disk flushing. The key findings include:
When K = 10^3, the system reaches its peak throughput (TPS) of 155,019.20 on x86, while CPU utilization drops to its minimum.
Performance improvements of 37.58% on x86 and 32.82% on ARM were observed compared with the original lock implementation.
CPU usage decreased from over 1000% to around 750% at the optimal K value, indicating significant resource savings.
Additional measurements of broker resource consumption showed that the K value yielding maximum TPS also corresponded to the lowest CPU usage.
Conclusion
The adaptive K‑value backoff lock provides a single, self‑tuning lock that achieves optimal performance across varying contention levels, reduces CPU waste, and simplifies deployment for high‑concurrency systems such as RocketMQ. The approach is validated on multiple CPU architectures and I/O strategies, demonstrating its broad applicability.
Paper Details
Title: Beyond the Bottleneck: Enhancing High‑Concurrency Systems with Lock Tuning
Authors: Ji Juntao, Gu Yinyou, Fu Yubao, Lin Qingshan
Abstract: High‑concurrency systems often hit performance bottlenecks due to intense lock contention, leading to waiting and costly context switches. By refining a lightweight spin lock and introducing a concise parameter‑tuning strategy, the authors achieve up to 37.58% (x86) and 32.82% (ARM) throughput gains in Apache RocketMQ, while maintaining low resource overhead across code versions and I/O flush modes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
