Understanding Linux rwsem Read‑Write Semaphore in Kernel 5.15.81
Linux introduced the read‑write semaphore (rwsem) as a sleep lock that lets multiple readers hold the lock concurrently, improving read‑heavy workload performance, and the article details its internal state representation, acquisition paths for reads and writes, optimistic spinning, handoff mechanisms, and trade‑offs, noting that mobile kernels may need further tuning.
The article explains why Linux introduced the read‑write semaphore (rwsem) as a sleep lock in addition to the traditional mutex, emphasizing that rwsem allows multiple readers to hold the lock simultaneously, which improves concurrency and performance for read‑heavy workloads.
It abstracts the rwsem data structure, showing that each rwsem object records two kinds of information: the lock state (reader counter and writer bit) and the tasks associated with the lock (owner pointers and waiters).
The lock state is represented by a counter for readers (zero means the lock is free) and a single bit for the writer, mirroring the mutex behavior. Owner tracking differs for readers and writers: a writer stores a direct task pointer, while readers may share a single pointer that indicates a task that has ever owned the lock.
If lock acquisition fails, the kernel can either perform optimistic spinning or enqueue the task in a wait queue. The article lists the two options and summarizes their trade‑offs.
It then details the external API of rwsem, followed by step‑by‑step explanations of how read locks are attempted ( down_read_trylock ), acquired ( __down_read_common ), and the fast and slow paths, including optimistic stealing, queue insertion, and wake‑up logic.
The release of a read lock ( __up_read ) decrements the reader counter, may clear the non‑spinnable flag, and wakes waiting tasks when appropriate.
For write locks, the article covers the try‑lock path, the fast path, and the more complex slow path, describing how waiters are prepared, how handoff flags are used, and how the kernel decides whether to spin, block, or hand the lock to the top waiter.
Optimistic spinning for writers is explained, including the conditions under which a writer may spin on the owner task, the thresholds for spinning, and the mechanisms for aborting the spin when the owner changes or the CPU needs to reschedule.
The handoff mechanism is discussed in detail: when and how the handoff flag is set, cleared, and how it ensures that the lock ownership is transferred to the top waiter to avoid starvation.
Finally, the article concludes that the standard Linux rwsem balances fairness, throughput, and latency, but on mobile platforms it may not be optimal for user‑experience‑critical threads. The OPPO kernel team has made further optimizations, which will be shared in future work.
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.