Fundamentals 33 min read

Unlocking Linux Performance: How RCU Enables Lock‑Free Reads and Fast Updates

This article explains the Linux kernel's Read‑Copy‑Update (RCU) mechanism, detailing its lock‑free read path, update workflow, grace period handling, performance benefits, limitations, real‑world applications, and essential API functions with practical code examples.

Deepin Linux

Sep 14, 2025

Unlocking Linux Performance: How RCU Enables Lock‑Free Reads and Fast Updates

When you monitor processes with ps -ef or watch CPU usage with top, you may wonder why browsers stay responsive and databases continue to serve requests despite many processes competing for CPU.

The answer lies in two core Linux kernel mechanisms: the process scheduler, which allocates CPU time, and the RCU (Read‑Copy‑Update) mechanism, which solves the "fast and safe" problem of concurrent reads and writes on shared data.

1. What is RCU?

RCU (Read‑Copy‑Update) is a synchronization mechanism designed for read‑heavy, write‑light scenarios. Unlike traditional locks, RCU minimizes read‑side overhead by allowing readers to access data without acquiring a lock, achieving lock‑free reads and greatly improving performance under high concurrency.

Traditional lock mechanisms (semaphores, spinlocks) protect shared data by requiring a global lock before access, which introduces two main problems:

Efficiency loss: atomic memory accesses break pipeline execution and, with read‑write locks, write locks are exclusive, preventing concurrent reads and writes.

Scalability issues: as CPU count grows, lock contention worsens, reducing performance.

RCU addresses these issues with two key ideas: copy‑then‑update and delayed reclamation of old data.

2. How RCU Works

2.1 Read Path

Enter critical section: a reader calls rcu_read_lock(), which disables preemption to prevent the thread from being preempted during the read.

Data access: the reader uses rcu_dereference() to safely obtain a pointer to the protected data, ensuring it sees a consistent snapshot.

Exit critical section: the reader calls rcu_read_unlock(), re‑enabling preemption.

#include <linux/rculist.h>
#include <linux/sched.h>
#include <linux/module.h>

struct rcu_demo_node {
    int data;
    struct list_head list;
};

static LIST_HEAD(rcu_demo_list);
static DEFINE_SPINLOCK(rcu_demo_lock);

static void rcu_reader_example(void)
{
    struct rcu_demo_node *node;
    rcu_read_lock();
    list_for_each_entry_rcu(node, &rcu_demo_list, list) {
        pr_info("RCU Reader: Read data = %d
", node->data);
    }
    rcu_read_unlock();
}

static int __init rcu_demo_init(void)
{
    pr_info("RCU demo module loaded
");
    rcu_reader_example();
    return 0;
}

static void __exit rcu_demo_exit(void)
{
    pr_info("RCU demo module unloaded
");
}

module_init(rcu_demo_init);
module_exit(rcu_demo_exit);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("RCU Read Operation Example");

2.2 Write Path

Copy data: the writer allocates a new object and copies the old data into it.

Modify the copy: changes are made on the new object without affecting readers.

Pointer replacement: an atomic operation such as rcu_assign_pointer() swaps the old pointer for the new one.

#include <linux/rculist.h>
#include <linux/sched.h>
#include <linux/module.h>
#include <linux/slab.h>

struct rcu_demo_node {
    int data;
    struct list_head list;
    struct rcu_head rcu;
};

static LIST_HEAD(rcu_demo_list);
static DEFINE_SPINLOCK(rcu_demo_lock);

static void rcu_node_free(struct rcu_head *rcu)
{
    struct rcu_demo_node *node = container_of(rcu, struct rcu_demo_node, rcu);
    kfree(node);
    pr_info("RCU Writer: Old node freed
");
}

static void rcu_writer_example(int old_val, int new_val)
{
    struct rcu_demo_node *old_node, *new_node;
    unsigned long flags;

    spin_lock_irqsave(&rcu_demo_lock, flags);
    list_for_each_entry(old_node, &rcu_demo_list, list) {
        if (old_node->data == old_val) {
            new_node = kmalloc(sizeof(*new_node), GFP_KERNEL);
            if (!new_node) {
                spin_unlock_irqrestore(&rcu_demo_lock, flags);
                return;
            }
            *new_node = *old_node; // copy
            new_node->data = new_val;
            pr_info("RCU Writer: Modified data from %d to %d
", old_val, new_val);
            list_replace_rcu(&old_node->list, &new_node->list);
            call_rcu(&old_node->rcu, rcu_node_free);
            break;
        }
    }
    spin_unlock_irqrestore(&rcu_demo_lock, flags);
}

3. Advantages of RCU

3.1 Performance Boost

Readers can access shared data without locks, eliminating lock contention and reducing context switches. In high‑concurrency read‑heavy workloads such as database queries, this dramatically lowers latency.

3.2 Scalability

RCU scales well on multi‑core systems because each CPU can perform lock‑free reads without additional synchronization overhead, unlike spinlocks that suffer from increased contention as core count grows.

3.3 No Deadlock Risk

Since readers never acquire locks, the classic deadlock scenario of circular lock dependencies is impossible, making RCU safe for complex kernel subsystems.

4. Limitations of RCU

4.1 Write Overhead

Writers must allocate new memory and copy existing data, consuming extra memory bandwidth and CPU cycles. Frequent writes can negate the read‑side benefits.

4.2 Limited Applicability

RCU shines in read‑dominant workloads; in write‑heavy or strict consistency scenarios (e.g., real‑time financial systems), the copy‑update model and grace period may be unsuitable.

4.3 Implementation Complexity

Correct use requires understanding memory barriers, atomic operations, and grace‑period mechanics, making debugging and maintenance more challenging.

5. Real‑World Use Cases

5.1 Kernel Data Structures

Linked lists and hash tables that store routing tables, file system metadata, or other shared resources benefit from RCU by allowing concurrent reads while updates happen on a separate copy.

5.2 File Systems

Metadata reads are lock‑free, while updates copy the metadata, modify it, and atomically replace the pointer, ensuring readers always see a consistent view.

5.3 Network Stack

Routing table lookups are performed by many CPUs concurrently; updates create a new table copy and swap the pointer, keeping packet forwarding fast and consistent.

6. Using RCU in Code

6.1 Key API

rcu_read_lock()

– marks the start of a read‑side critical section. rcu_read_unlock() – ends the read‑side critical section. synchronize_rcu() – blocks until all pre‑existing read‑side sections have finished. call_rcu() – registers a callback to run after the grace period. rcu_assign_pointer() – atomically updates an RCU‑protected pointer. rcu_dereference() – safely reads an RCU‑protected pointer.

rcu_read_lock();
rcu_read_unlock();
synchronize_rcu();
call_rcu(&head, func);
rcu_assign_pointer(ptr, new);
rcu_dereference(ptr);

6.2 Example: RCU‑protected Linked List

#include <linux/module.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/rcupdate.h>
#include <linux/list.h>

struct my_node {
    int data;
    struct list_head list;
    struct rcu_head rcu;
};

static LIST_HEAD(my_list);

void add_node(int new_data)
{
    struct my_node *new_node = kmalloc(sizeof(*new_node), GFP_KERNEL);
    if (!new_node)
        return;
    new_node->data = new_data;
    list_add_rcu(&new_node->list, &my_list);
}

void remove_node(struct my_node *node)
{
    list_del_rcu(&node->list);
    call_rcu(&node->rcu, (void (*)(struct rcu_head *))kfree);
}

void traverse_list(void)
{
    struct my_node *entry;
    rcu_read_lock();
    list_for_each_entry_rcu(entry, &my_list, list)
        printk(KERN_INFO "Node data: %d
", entry->data);
    rcu_read_unlock();
}

static int __init my_module_init(void)
{
    add_node(10);
    add_node(20);
    traverse_list();
    {
        struct my_node *node_to_remove = list_entry(my_list.next, struct my_node, list);
        remove_node(node_to_remove);
    }
    traverse_list();
    return 0;
}

static void __exit my_module_exit(void)
{
    struct my_node *entry, *tmp;
    synchronize_rcu();
    list_for_each_entry_safe_rcu(entry, tmp, &my_list, list) {
        list_del_rcu(&entry->list);
        kfree(entry);
    }
}

module_init(my_module_init);
module_exit(my_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Your Name");
MODULE_DESCRIPTION("RCU Linked List Example");

The image below illustrates the RCU workflow.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

concurrency kernel RCU Linux Synchronization

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.