Fundamentals 20 min read

Why Does Linux Use Preemptible Kernels? A Deep Dive into Kernel Preemption Mechanics

This article explains the technical details of Linux kernel preemption, covering the difference between preemptible and non‑preemptible kernels, the role of the reschedule flag and preempt count, scheduling checkpoints and preempt points, low‑latency handling in non‑preemptible kernels, and the voluntary preemption model.

Liangxu Linux

Nov 18, 2021

Why Does Linux Use Preemptible Kernels? A Deep Dive into Kernel Preemption Mechanics

1. Introduction and Environment

The discussion assumes an ARM64 processor running Linux 5.11 on Ubuntu 20.04.1, with source code examined using vim, ctags, and cscope. It aims to clarify what kernel preemption is, how it relates to a preemptive kernel, and the purpose of the preempt count.

2. Preemptible vs. Non‑Preemptible Kernels

Running uname -a shows the PREEMPT flag, indicating a preemptible kernel. In Linux terminology, a kernel that supports preemption is called a preemptible kernel , while one that does not is a non‑preemptible kernel . The article focuses on the CFS (Completely Fair Scheduler) class.

# uname -a
Linux (none) 5.11.0-g08a3831f3ae1 SMP PREEMPT Fri Apr 30 17:41:53 CST 2021 aarch64 GNU/Linux

The kernel configuration file kernel/Kconfig.preempt defines several options:

config PREEMPT_NONE
    bool "No Forced Preemption (Server)"
    help
        This is the traditional Linux preemption model, geared towards throughput.

config PREEMPT
    bool "Preemptible Kernel (Low‑Latency Desktop)"
    depends on !ARCH_NO_PREEMPT
    select PREEMPTION
    select UNINLINE_SPIN_UNLOCK if !ARCH_INLINE_SPIN_UNLOCK
    select PREEMPT_DYNAMIC if HAVE_PREEMPT_DYNAMIC
    help
        This option reduces latency by making most kernel code preemptible.

Other options such as PREEMPT_VOLUNTARY and PREEMPT_RT add extra preemption points or real‑time capabilities.

3. Reschedule Flag and Preempt Count

Some kernel paths (e.g., atomic context) cannot schedule. When a high‑priority task is woken, the kernel sets a reschedule flag ( TIF_NEED_RESCHED) in the task’s thread_info flags. The scheduler will act on this flag once it returns to a preemptible context.

#define TIF_NEED_RESCHED 1  /* rescheduling necessary */

The preempt count is stored in the same thread_info structure as a union, allowing the kernel to track both the need‑reschedule flag and the preempt count:

struct thread_info {
    unsigned long flags;
    union {
        u64 preempt_count;
        struct {
            u32 need_resched;
            u32 count;
        } preempt;
    };
};

When need_resched is set and preempt.count == 0, the kernel may perform a context switch.

4. Scheduling Points

4.1 Checkpoints (setting the flag)

Timer tick : In scheduler_tick, if the current task’s execution time exceeds its ideal runtime or a higher‑priority task is ready, resched_curr() sets the flag.

Wake‑up preemption : During fork or normal wake‑up paths, check_preempt_curr() may call resched_curr() when the newly woken task’s virtual runtime is sufficiently smaller than the current task’s.

// Example from kernel/sched/core.c
if (delta_exec > ideal_runtime) {
    resched_curr(rq_of(cfs_rq));
}

4.2 Preempt Points (actually invoking the scheduler)

Interrupt return : After handling an interrupt, the kernel checks preempt_count. If it is zero, arm64_preempt_schedule_irq() calls __schedule(true) to perform preemptive scheduling.

preempt_enable : When a critical section ends, preempt_enable() decrements the preempt count; if it reaches zero, __preempt_schedule() triggers the scheduler.

local_bh_enable : Re‑enabling soft‑irqs may also invoke preempt_check_resched(), leading to a schedule if needed.

// arm64 entry.S snippet
ldr x24, [tsk, #TSK_TI_PREEMPT]
cbnz x24, 1f
bl arm64_preempt_schedule_irq
1:

5. Low‑Latency Handling in Non‑Preemptible Kernels

In kernels without preemption, long‑running paths (e.g., filesystem or memory reclaim) use the cond_resched() macro to voluntarily check whether a reschedule is required.

// mm/vmscan.c example
while (!list_empty(page_list)) {
    ...
    cond_resched();
    ...
}

The macro expands to a call to _cond_resched(), which checks should_resched(0) and, if true, invokes preempt_schedule_common().

#define cond_resched() ({ ___might_sleep(__FILE__, __LINE__, 0); _cond_resched(); })

6. Voluntary Kernel Preemption (CONFIG_PREEMPT_VOLUNTARY)

When CONFIG_PREEMPT_VOLUNTARY=y, the kernel adds explicit preemption points. The macro might_resched() maps to _cond_resched() and is only effective under this configuration.

#ifdef CONFIG_PREEMPT_VOLUNTARY
extern int _cond_resched(void);
#define might_resched() _cond_resched()
#else
#define might_resched() do { } while (0)
#endif

Searches show that most heavy kernel paths already use cond_resched(), so might_resched() is rarely invoked directly.

7. Summary

The article explains that preemptible kernels are suited for interactive devices (handhelds, desktops) where low latency is important, while non‑preemptible kernels target server workloads that prioritize throughput. Scheduling decisions are split into “checkpoints” that set the reschedule flag and “preempt points” that actually invoke the scheduler. Non‑preemptible kernels achieve low latency via cond_resched(), and the voluntary preemption model adds explicit preemption points for finer‑grained latency control.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kernel Linux Scheduling Operating System ARM64 preemption CFS

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.