Operations 26 min read

Understanding Synchronization Mechanisms and RCU in the Linux Kernel

Linux kernel synchronization requires protecting shared mutable state from concurrent access using primitives such as spinlocks, mutexes, read‑write locks, or lock‑less techniques like RCU, which copies data and waits for a grace period, each offering distinct performance, latency, and complexity trade‑offs.

OPPO Kernel Craftsman
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Understanding Synchronization Mechanisms and RCU in the Linux Kernel

When reading or writing kernel code, one must assume that any execution flow can be pre‑empted after any instruction and later resume at an indeterminate time. This raises the question of whether the execution environment on which an instruction depends may change during the interruption. If the environment is exclusive, the instruction is safe; if it is shared, unexpected modifications can cause synchronization problems, typically solved with atomic variables or locks.

Most engineers decide to use synchronization based on a simple rule: global variables need locks, local variables do not. While this rule works in many cases, the distinction between "global" and "local" is really a distinction between "shared" and "exclusive" resources, and there are several scenarios where the rule does not hold:

(1) A function may allocate resources on the stack, link them into a global list, and then go to sleep. The stack data lives until the task is awakened, turning an apparently exclusive resource into a shared one that requires synchronization.

(2) Both code and data reside in memory, which is readable and writable. Even though the OS marks code sections as read‑only, developers sometimes need to modify code or other non‑data sections, which can also lead to synchronization issues.

(3) A variable may be globally visible but never accessed concurrently (e.g., per‑task or percpu variables). Synchronization problems require both sharing and simultaneous access; without the latter, no race occurs.

(4) Even on a single‑core CPU, operations such as i++ are not atomic because they consist of a load, modify, and store sequence. If the load is interrupted and another flow modifies the variable, the original flow resumes with stale data, creating a race.

Therefore, synchronization issues arise from shared mutable state combined with concurrent access. The problems manifest as incorrect reads, write‑write conflicts, and various forms of reordering.

Compilers may reorder or cache values assuming a single‑threaded execution model. To prevent aggressive optimizations, kernel code uses volatile , WRITE_ONCE/READ_ONCE , or memory barriers. CPUs themselves may execute instructions out of order, especially on weak‑memory architectures, requiring explicit barriers to enforce ordering.

When a shared resource is accessed concurrently, engineers often add locks. The two classic kernel locks are spinlocks and mutexes. A spinlock busy‑waits while a mutex sleeps when the lock is unavailable. Their differences go beyond acquisition failure handling:

(1) Spinlocks disable preemption but not necessarily interrupts, so they can be used in interrupt context but may still be pre‑empted by higher‑priority handlers.

(2) Mutexes cannot be used in interrupt context and do not disable preemption, which can lead to nested lock complexities.

Both rely on hardware atomic operations (e.g., cmpxchg ) to manipulate a single‑word lock variable. The lock itself becomes the contested resource, reducing a complex data race to a contention on a single word.

Locks, however, introduce latency, fairness issues, and cache pressure. In high‑contention scenarios, spinlocks waste CPU cycles, while mutexes cause costly wake‑ups and context switches. Consequently, developers look for alternatives.

Alternative solutions include read‑write locks (rwlock, rwsem) for read‑heavy workloads, lock‑less designs that rely solely on atomic primitives, and per‑CPU data structures that eliminate sharing by giving each CPU its own copy of a variable.

One prominent lock‑less technique is RCU (Read‑Copy‑Update). RCU’s basic idea is to copy a shared object, modify the copy, and then replace the original pointer with the new one. Readers continue to see the old version until they finish, after which the old data can be reclaimed.

RCU is tailored for scenarios with many readers and few writers. The kernel implements RCU by using a grace period: after a writer updates a pointer, it must wait until all pre‑existing readers have exited their critical sections before freeing the old data.

The kernel tracks reader exit by disabling preemption on entry to an RCU read‑side critical section and re‑enabling it on exit. Since a context switch can only occur when preemption is enabled, the writer simply waits for a scheduling event on each CPU. Once every CPU has performed at least one schedule, all earlier readers must have left, and the grace period ends.

In practice, a writer may call synchronize_rcu() , which forces a schedule on each CPU. When the last CPU schedules, the grace period is complete, and the writer can safely free the old object.

Key characteristics of Linux’s RCU implementation:

Designed for read‑heavy workloads.

Writer latency due to the grace period.

Readers may see either the old or the new version during the transition.

Only pointers to dynamically allocated objects can be protected.

Read‑side critical sections cannot block, sleep, or be pre‑empted.

Implementation relies on disabling/enabling preemption rather than explicit per‑read‑side bookkeeping.

While RCU offers excellent read‑side performance and scalability, its write‑side latency and the restriction that protected objects be pointers must be considered when choosing a synchronization primitive.

In summary, synchronization in the Linux kernel involves a spectrum of mechanisms—from simple spinlocks and mutexes to sophisticated lock‑less techniques like RCU—each with trade‑offs in performance, complexity, and applicability.

performanceconcurrencyRCUSynchronizationMutexLinux kernelspinlock
OPPO Kernel Craftsman
Written by

OPPO Kernel Craftsman

Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.