Fundamentals 35 min read

How Linux Schedules Processes and Threads: From Tasks to CFS and Real‑Time

This article explains Linux's scheduling subsystem in depth, covering process definitions, memory layout, state machines, context switches, priority and timeslice handling, the modular scheduler framework, various scheduler classes such as CFS, real‑time and deadline, group scheduling, signal processing, and the differences between kernel and user threads, providing a comprehensive guide for developers and system engineers.

Liangxu Linux
Liangxu Linux
Liangxu Linux
How Linux Schedules Processes and Threads: From Tasks to CFS and Real‑Time

Process Overview

In Linux a running program is abstracted as a process , the smallest unit of resource allocation (virtual memory, file handles, signals, etc.). Inside the kernel both processes and threads are represented by a unified task_struct called a task .

Process Memory Layout (x86_64)

The virtual address space is split into kernel space (Ring 0) and user space (Ring 3). On 64‑bit systems the address space is 48 bits, giving 256 TB total; the lower 128 TB is user space, the upper 128 TB is kernel space.

Stack : grows downwards from a high address, default size 8 MiB (configurable via ulimit -s).

Memory‑mapped area : holds file‑backed and anonymous mappings, loaded by mmap. Large anonymous allocations (> MMAP_THRESHOLD, default 128 KB) use malloc and may be used for IPC.

Heap : grows upwards, managed via brk and sbrk.

BSS / DATA / rodata : store uninitialized, initialized, and read‑only global data respectively.

Text : contains the executable code.

Process Scheduling

State Machine

A process moves through several states during its lifetime:

R (TASK_RUNNING) : runnable; may be selected for CPU execution.

S (TASK_INTERRUPTIBLE) : sleeping, waiting for an event; can be awakened by signals or interrupts.

T (TASK_STOPPED / TASK_TRACED) : stopped by SIGSTOP or being traced.

Z (TASK_DEAD) : zombie, waiting for the parent to reap it.

D (TASK_UNINTERRUPTIBLE) : non‑interruptible sleep, used in critical kernel sections.

Context Switch

Save the current task's CPU registers and state to memory.

Load the next task's saved state from memory.

Jump to the restored program counter to resume execution.

Context switches occur only in kernel mode and are relatively expensive, consuming nanoseconds per switch.

Priority and Timeslice

Linux distinguishes static and dynamic priorities. The static priority is set via the nice value (‑20 (high) to +19 (low), default 0). Real‑time tasks have priorities 0‑99, higher than normal tasks (100‑139). The scheduler allocates a timeslice (quantum) to each runnable task; short timeslices increase switch overhead, while long timeslices hurt interactivity.

Scheduler Framework

The scheduler is modular, consisting of a core layer and specific scheduler classes. The core layer performs load‑balancing across CPUs, while each CPU runs a main scheduler and a tick scheduler.

Scheduler Classes

CFS (Completely Fair Scheduler) : default for normal tasks; uses a red‑black tree ordered by vruntime and a configurable latency ( sched_latency_ns).

Real‑Time Scheduler : implements FIFO and Round‑Robin policies (SCHED_FIFO, SCHED_RR) with static priorities.

Deadline Scheduler : EDF + CBS, tasks specify deadline, period, and runtime.

Stop‑sched‑class : highest priority, used for stop‑machine operations and CPU hot‑plug.

Idle‑sched‑class : runs when no other task is runnable; typically executes the hlt instruction.

Scheduling Order

Priority order: stop‑task → deadline → real‑time → fair (CFS) → idle .

Group Scheduling

Linux implements control groups (cgroups) to allocate resources to a set of tasks. A task_group forms a tree; each group has its own runqueue and scheduling entities. The scheduler recursively selects an entity from the root group down to an individual task.

Real‑Time Group Scheduling

The group's priority is the highest priority among its member real‑time tasks.

Fair Group Scheduling (CFS)

Groups ensure fairness across users: each user can be assigned a group, preventing a CPU‑bound user from starving interactive users.

Signal Handling

Signals are soft interrupts used to notify a process of asynchronous events. They can be caught, ignored, or left with the default action (usually termination). Common signals include SIGSTOP, SIGKILL, SIGSEGV, etc.

Signal categories: termination, exception, system‑call errors, user‑generated, terminal‑related, tracing, etc.

Multithreaded signal handling guidelines: do not block SIGSTOP or SIGKILL, ensure sigwait() set is blocked in all threads, use kill() to deliver signals to the whole process.

Threads

A thread is the smallest unit of CPU scheduling. In Linux threads are implemented as lightweight processes (LWP) that share most resources of the parent process.

Kernel Threads

Created by the kernel, run only in kernel space, cannot access user memory, and are used for background tasks such as page flushing, swap management, and timer handling. Examples include kworker, kswapd, ksoftirqd, etc.

User Threads

Implemented via the clone() system call with the CLONE_VM flag, resulting in a separate task that shares the address space. The historic LinuxThreads library used a 1:1 model (one user thread per LWP). Modern Linux uses NPTL (Native POSIX Thread Library), also 1:1, offering better performance and POSIX compliance.

Coroutines

Coroutines are user‑level threads that perform context switches entirely in user space, avoiding kernel mode switches. They are cheaper than OS threads but have compatibility considerations across architectures.

Overall, Linux's scheduling subsystem provides a rich set of mechanisms—static/dynamic priorities, multiple scheduler classes, group scheduling, and fine‑grained control via sysctl parameters ( /proc/sys/kernel/sched_rt_period_us, /proc/sys/kernel/sched_rt_runtime_us)—to balance fairness, efficiency, responsiveness, and throughput for both interactive and real‑time workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Real-TimeLinuxprocess schedulingCFSThreads
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.