Master Linux Process Scheduling: From CFS to Real-Time and Deadline Strategies
This article explains how the Linux kernel scheduler allocates CPU time among user processes in multi‑core environments, covering the main scheduler types—CFS, RT, and Deadline—their algorithms, priority schemes, and practical configuration using ps, nice, and chrt commands.
Process Scheduling
Process scheduling describes how the Linux kernel scheduler assigns CPU time to multiple user processes, enabling fair competition and efficient CPU resource distribution.
In early single‑core systems the scheduler relied on simple time‑slice round‑robin and priority algorithms. Modern multi‑core systems must also consider load balancing, cache affinity, and inter‑core contention, so this article focuses on multi‑core scheduling.
Linux implements several scheduler families:
CFS (Completely Fair Scheduler)
RT (Real‑time Scheduler)
DS (Deadline Scheduler)
Each scheduler operates on the ready queue of every CPU core and uses its own algorithm and priority policy.
Only user‑process entities are visible to administrators, so most configurable scheduling options target user processes.
The kernel classifies processes into two major groups, each with its own priority range and scheduling algorithm:
Real‑time processes : priorities 0‑99.
Normal processes : priorities 100‑139.
Real‑time priorities are fixed at creation; only normal processes can have their priority adjusted.
Normal‑process priority is controlled via the Nice level, ranging from –20 to 19, mapping to the 40 normal‑process priority levels. Unprivileged users may set Nice values 0‑19, while root can use –20‑19.
CFS – Completely Fair Scheduler
CFS is the default scheduler for general‑purpose workloads. It does not rely solely on fixed time slices; instead it calculates a virtual runtime (VRT) for each process. Short‑running processes receive higher priority, ensuring each process obtains a fair share of CPU time.
CFS uses a red‑black tree per CPU. Each node stores a process control block and a key equal to the process's VRT. The smaller the VRT, the higher the priority.
Each CPU maintains its own red‑black tree; the key is the process's VRT, reflecting accumulated CPU time.
When a new normal process is created, it is inserted with VRT = 0, giving it the highest priority.
When the CPU is idle, the scheduler selects the node with the smallest VRT, runs that process, then increments its VRT, lowering its next‑run priority.
CFS provides fair CPU distribution but its tree operations can be costly under high concurrency.
SCHED_NORMAL (standard normal‑process algorithm)
Dynamic priority : a process's nice value determines its priority; lower nice → higher priority.
Time‑slice round‑robin : processes with equal priority share the CPU in a rotating fashion.
SCHED_BATCH (batch‑oriented algorithm)
SCHED_BATCH targets CPU‑intensive background jobs, running them primarily when the system is idle. It groups background tasks into a process group that shares a tunable time slice, giving the group a higher priority during idle periods.
RT – Real‑time Scheduler
RT uses a fixed‑priority scheme where higher‑priority tasks receive more CPU time. It is the default for real‑time kernels and is suited for latency‑sensitive workloads.
RT priorities range from 1 (highest) to 99 (lowest); 0 is reserved for the kernel. RT implements preemptive scheduling, allowing a higher‑priority task to preempt a running lower‑priority task.
RT maintains two ready queues per CPU—one for real‑time tasks and one for normal tasks. Real‑time tasks use SCHED_FIFO, while normal tasks use SCHED_RR.
SCHED_FIFO (first‑in‑first‑out)
Processes run to completion or until preempted by a higher‑priority task, which can cause starvation of lower‑priority processes.
SCHED_RR (round‑robin)
Similar to FIFO but each process receives a fixed time slice; after the slice expires the process is placed at the end of the queue, preventing starvation.
DS – Deadline Scheduler
DS schedules based on explicit deadlines. Like CFS it uses a red‑black tree, but the key is the deadline rather than VRT.
Each node stores a deadline value.
Processes with nearer deadlines receive higher priority.
When the CPU is idle, the scheduler selects the process whose deadline is closest to the current time.
SCHED_DEADLINE (deadline‑based algorithm)
SCHED_DEADLINE is the default algorithm of the DS scheduler and is primarily used for real‑time tasks.
Configuring Process Scheduling Policies
ps command
The ps utility displays process status information.
$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.1 0.0 78088 9188 ? Ss 04:26 0:03 /sbin/init maybe-ubiquity
... (output truncated)Additional ps options reveal CPU usage, priority, and scheduling class details.
$ pidstat -p 12285
02:53:02 PM UID PID %usr %system %guest %CPU CPU Command
02:53:02 PM 0 12285 0.00 0.00 0.00 0.00 5 pythonPID – process ID
%usr – user‑mode CPU percentage
%system – kernel‑mode CPU percentage
%CPU – overall CPU usage
CPU – CPU core on which the process runs
Command – command that started the process
nice command
niceadjusts the nice value of a normal process. nice -n -5 service httpd start To change the nice value of an existing process, use renice:
$ ps -le | grep nova-compute
4 S 1000 9301 1 2 80 0 - 530107 ep_pol ? 00:02:50 nova-compute
$ renice -10 9301
9301 (process ID) old priority 0, new priority -10
$ ps -le | grep nova-compute
4 S 1000 9301 1 2 70 -10 - 530107 ep_pol ? 00:02:54 nova-computechrt command
chrtchanges a process's scheduling policy and priority.
$ chrt --help
Show or change the real-time scheduling attributes of a process.
Set policy:
chrt [options] <priority> <command> [<arg>...]
chrt [options] --pid <priority> <pid>
Get policy:
chrt [options] -p <pid>
Policy options:
-b, --batch set policy to SCHED_BATCH
-d, --deadline set policy to SCHED_DEADLINE
-f, --fifo set policy to SCHED_FIFO
-i, --idle set policy to SCHED_IDLE
-o, --other set policy to SCHED_OTHER
-r, --rr set policy to SCHED_RR (default)Example: change a process to round‑robin with priority 10:
$ chrt -r 10 bash
$ chrt -p $$
pid 13360's current scheduling policy: SCHED_RR
pid 13360's current scheduling priority: 10Example: adjust a real‑time process's priority:
$ chrt -p 31
pid 31's current scheduling policy: SCHED_FIFO
pid 31's current scheduling priority: 99
$ chrt -f -p 50 31
$ chrt -p 31
pid 31's current scheduling policy: SCHED_FIFO
pid 31's current scheduling priority: 50Key fields displayed by ps for scheduling information include:
wchan : kernel function where a sleeping process is blocked ("-" if running).
nwchan : address of the sleeping kernel function ("-" if running).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
