Fundamentals 59 min read

Deep Dive into Linux Process Scheduling Mechanisms

This article explains the concepts, reasons, and implementation details of Linux process scheduling, covering cooperative and preemptive scheduling, run‑queue structures, wake‑up paths, scheduling ticks, context switching, priority handling, and the various scheduling classes used in the kernel.

Deepin Linux

Nov 18, 2024

Deep Dive into Linux Process Scheduling Mechanisms

1. Introduction

In the Linux operating system, processes are like citizens of a bustling city and the scheduler acts as the traffic controller, allocating CPU time to ensure efficient, stable, and fair execution of all tasks.

2. What Is Process Scheduling?

Scheduling is the kernel’s mechanism for managing the CPU, deciding which runnable process (or thread) gets the processor next. It can be performed directly on threads (one‑level scheduling) or first on processes and then on threads inside them (two‑level scheduling). Linux adopts one‑level scheduling to maximize concurrency on multi‑core CPUs.

3. Why Scheduling Is Needed

Without a scheduler a system could run only one program at a time; CPU would be idle during I/O waits and long‑running CPU‑bound processes could starve others. Cooperative multitasking required programs to call sched_yield() voluntarily, but this proved insufficient, leading to pre‑emptive multitasking where the kernel can interrupt a running task.

4. How Scheduling Is Triggered

Scheduling can be triggered actively (a process voluntarily yields or blocks) or passively via timer interrupts, wake‑up events, priority changes, or when returning from system calls, interrupts, or soft‑IRQ handling.

5. Scheduler Framework

Each CPU has its own run‑queue ( struct rq) that holds runnable tasks. The key fields are shown below:

struct rq {<br/>    raw_spinlock_t __lock;<br/>    unsigned int nr_running;<br/>    struct cfs_rq cfs;<br/>    struct rt_rq rt;<br/>    struct dl_rq dl;<br/>    struct task_struct *curr;<br/>    struct task_struct *idle;<br/>    int cpu;<br/>    int online;<br/>};

The kernel defines a per‑CPU variable runqueues ( DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);) that stores these structures.

6. Process Wake‑Up Paths

When a new task is created the kernel calls wake_up_new_task() to place it on a run‑queue and possibly pre‑empt the current task:

void wake_up_new_task(struct task_struct *p) {<br/>    raw_spin_lock_irqsave(&p->pi_lock, flags);<br/>    WRITE_ONCE(p->__state, TASK_RUNNING);<br/>    p->recent_used_cpu = task_cpu(p);<br/>    __set_task_cpu(p, select_task_rq(p, task_cpu(p), WF_FORK));<br/>    rq = __task_rq_lock(p, &rf);<br/>    activate_task(rq, p, ENQUEUE_NOCLOCK);<br/>    check_preempt_curr(rq, p, WF_FORK);<br/>    task_rq_unlock(rq, p, &rf);<br/>}

For a blocked task the core routine is try_to_wake_up(), which selects a target CPU, possibly migrates the task, and enqueues it:

static int try_to_wake_up(struct task_struct *p, unsigned int state, int flags) {<br/>    raw_spin_lock_irqsave(&p->pi_lock, irq_flags);<br/>    if (!ttwu_state_match(p, state, &success)) goto out;<br/>    cpu = select_task_rq(p, p->wake_cpu, flags | WF_TTWU);<br/>    if (task_cpu(p) != cpu) set_task_cpu(p, cpu);<br/>    ttwu_queue(p, cpu, flags);<br/>out: raw_spin_unlock_irqrestore(&p->pi_lock, irq_flags);<br/>    return success;<br/>}

7. Scheduler Tick

The periodic timer interrupt (default 250 Hz) invokes scheduler_tick(), which calls the current task’s task_tick() method and may start load‑balancing on SMP systems:

void scheduler_tick(void) {<br/>    int cpu = smp_processor_id();<br/>    struct rq *rq = cpu_rq(cpu);<br/>    struct task_struct *curr = rq->curr;<br/>    curr->sched_class->task_tick(rq, curr, 0);<br/>#ifdef CONFIG_SMP<br/>    rq->idle_balance = idle_cpu(cpu);<br/>    trigger_load_balance(rq);<br/>#endif<br/>}

8. Scheduling Points

The kernel checks the need‑to‑reschedule flag ( _TIF_NEED_RESCHED) at several exit points: after a system call returns to user space, after an interrupt returns to user or kernel space, after exiting a pre‑empt disabled critical section, and at explicit cond_resched() calls in long‑running kernel code.

9. Context Switch

The core switch routine is context_switch(). It first switches the memory map with switch_mm_irqs_off() (which writes a new CR3 value on x86) and then swaps kernel stacks via the assembly macro switch_to:

static __always_inline struct rq *context_switch(struct rq *rq,<br/>    struct task_struct *prev, struct task_struct *next, struct rq_flags *rf) {<br/>    switch_mm_irqs_off(prev->active_mm, next->mm, next);<br/>    switch_to(prev, next, prev);<br/>    return finish_task_switch(prev);<br/>}

On x86 the assembly saves callee‑saved registers, swaps the stack pointer stored in task_struct.thread.sp, restores the registers of the next task, and returns to its execution point.

10. Priority and Scheduling Classes

Linux defines five scheduling classes: stop, deadline, realtime, fair (CFS), and idle. Real‑time classes ( SCHED_FIFO, SCHED_RR) have static priorities 1‑99; the fair class uses the Completely Fair Scheduler (CFS) with dynamic priorities derived from static priority (nice) and runtime statistics. The task_struct fields involved are:

struct task_struct {<br/>    int prio;<br/>    int static_prio;<br/>    int normal_prio;<br/>    unsigned int rt_priority;<br/>};

Realtime tasks are always selected before fair tasks; within a class the kernel calls pick_next_task() to choose the highest‑weight runnable task.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux CFS

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.