Why Does Linux Use Preemptible Kernels? A Deep Dive into Kernel Preemption Mechanics
This article explains the technical details of Linux kernel preemption, covering the difference between preemptible and non‑preemptible kernels, the role of the reschedule flag and preempt count, scheduling checkpoints and preempt points, low‑latency handling in non‑preemptible kernels, and the voluntary preemption model.
1. Introduction and Environment
The discussion assumes an ARM64 processor running Linux 5.11 on Ubuntu 20.04.1, with source code examined using vim, ctags, and cscope. It aims to clarify what kernel preemption is, how it relates to a preemptive kernel, and the purpose of the preempt count.
2. Preemptible vs. Non‑Preemptible Kernels
Running uname -a shows the PREEMPT flag, indicating a preemptible kernel. In Linux terminology, a kernel that supports preemption is called a preemptible kernel , while one that does not is a non‑preemptible kernel . The article focuses on the CFS (Completely Fair Scheduler) class.
# uname -a
Linux (none) 5.11.0-g08a3831f3ae1 SMP PREEMPT Fri Apr 30 17:41:53 CST 2021 aarch64 GNU/LinuxThe kernel configuration file kernel/Kconfig.preempt defines several options:
config PREEMPT_NONE
bool "No Forced Preemption (Server)"
help
This is the traditional Linux preemption model, geared towards throughput.
config PREEMPT
bool "Preemptible Kernel (Low‑Latency Desktop)"
depends on !ARCH_NO_PREEMPT
select PREEMPTION
select UNINLINE_SPIN_UNLOCK if !ARCH_INLINE_SPIN_UNLOCK
select PREEMPT_DYNAMIC if HAVE_PREEMPT_DYNAMIC
help
This option reduces latency by making most kernel code preemptible.Other options such as PREEMPT_VOLUNTARY and PREEMPT_RT add extra preemption points or real‑time capabilities.
3. Reschedule Flag and Preempt Count
Some kernel paths (e.g., atomic context) cannot schedule. When a high‑priority task is woken, the kernel sets a reschedule flag ( TIF_NEED_RESCHED) in the task’s thread_info flags. The scheduler will act on this flag once it returns to a preemptible context.
#define TIF_NEED_RESCHED 1 /* rescheduling necessary */The preempt count is stored in the same thread_info structure as a union, allowing the kernel to track both the need‑reschedule flag and the preempt count:
struct thread_info {
unsigned long flags;
union {
u64 preempt_count;
struct {
u32 need_resched;
u32 count;
} preempt;
};
};When need_resched is set and preempt.count == 0, the kernel may perform a context switch.
4. Scheduling Points
4.1 Checkpoints (setting the flag)
Timer tick : In scheduler_tick, if the current task’s execution time exceeds its ideal runtime or a higher‑priority task is ready, resched_curr() sets the flag.
Wake‑up preemption : During fork or normal wake‑up paths, check_preempt_curr() may call resched_curr() when the newly woken task’s virtual runtime is sufficiently smaller than the current task’s.
// Example from kernel/sched/core.c
if (delta_exec > ideal_runtime) {
resched_curr(rq_of(cfs_rq));
}4.2 Preempt Points (actually invoking the scheduler)
Interrupt return : After handling an interrupt, the kernel checks preempt_count. If it is zero, arm64_preempt_schedule_irq() calls __schedule(true) to perform preemptive scheduling.
preempt_enable : When a critical section ends, preempt_enable() decrements the preempt count; if it reaches zero, __preempt_schedule() triggers the scheduler.
local_bh_enable : Re‑enabling soft‑irqs may also invoke preempt_check_resched(), leading to a schedule if needed.
// arm64 entry.S snippet
ldr x24, [tsk, #TSK_TI_PREEMPT]
cbnz x24, 1f
bl arm64_preempt_schedule_irq
1:5. Low‑Latency Handling in Non‑Preemptible Kernels
In kernels without preemption, long‑running paths (e.g., filesystem or memory reclaim) use the cond_resched() macro to voluntarily check whether a reschedule is required.
// mm/vmscan.c example
while (!list_empty(page_list)) {
...
cond_resched();
...
}The macro expands to a call to _cond_resched(), which checks should_resched(0) and, if true, invokes preempt_schedule_common().
#define cond_resched() ({ ___might_sleep(__FILE__, __LINE__, 0); _cond_resched(); })6. Voluntary Kernel Preemption (CONFIG_PREEMPT_VOLUNTARY)
When CONFIG_PREEMPT_VOLUNTARY=y, the kernel adds explicit preemption points. The macro might_resched() maps to _cond_resched() and is only effective under this configuration.
#ifdef CONFIG_PREEMPT_VOLUNTARY
extern int _cond_resched(void);
#define might_resched() _cond_resched()
#else
#define might_resched() do { } while (0)
#endifSearches show that most heavy kernel paths already use cond_resched(), so might_resched() is rarely invoked directly.
7. Summary
The article explains that preemptible kernels are suited for interactive devices (handhelds, desktops) where low latency is important, while non‑preemptible kernels target server workloads that prioritize throughput. Scheduling decisions are split into “checkpoints” that set the reschedule flag and “preempt points” that actually invoke the scheduler. Non‑preemptible kernels achieve low latency via cond_resched(), and the voluntary preemption model adds explicit preemption points for finer‑grained latency control.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
