Master Linux Scheduling: From CFS to CPU Affinity and systemd
This article explains Linux scheduling strategies, including real‑time and non‑real‑time policies, the CFS and Deadline schedulers, command‑line and systemd methods for setting policies, and techniques for CPU affinity using pinning, NUMA awareness, and cgroup cpuset controls.
1. Scheduling Strategies
Scheduling Process
A single CPU can execute only one process at a time; Linux achieves multitasking by interleaving processes on the same CPU.
The kernel scheduler decides which process runs at any moment, balancing quick decision making, fairness, interactive responsiveness, and predictability/scalability.
Quickly decide the next process to run
Fair CPU time distribution, with higher‑priority processes receiving more time and pre‑empting lower‑priority ones
Responsive interactive applications
Predictable and scalable under varied workloads
Process Priority
In Linux, the scheduler controls execution order based on each thread or process's scheduling policy and priority, which are divided into real‑time and non‑real‑time strategies.
Scheduling Policies
RHEL provides six scheduling policies, categorized as real‑time and non‑real‑time.
Real‑time policies
SCHED_FIFO: First‑in‑first‑out without time slicing; runs until I/O blocks or a higher‑priority process pre‑empts it.
SCHED_RR: Uses round‑robin time slices for tasks of equal priority.
Non‑real‑time policies
SCHED_NORMAL (OTHER): Default for most Linux processes.
SCHED_BATCH: Suited for batch‑type workloads.
SCHED_IDLE: Benefits low‑priority background applications.
CFS Scheduling
Since kernel 2.6.23, the Completely Fair Scheduler (CFS) is the default. It manages runnable processes with a red‑black tree based on virtual time; the process with the longest virtual time (i.e., longest wait) receives the CPU, and its virtual time decreases while it runs.
Deadline Scheduling
RHEL 8 introduces SCHED_DEADLINE for real‑time systems, guaranteeing task execution using three parameters: period, deadline, and runtime (worst‑case execution time).
Period – the interval at which a job repeats (e.g., 16 ms for 60 fps video).
Deadline – the latest time by which the job must finish.
Runtime – the maximum CPU time the job may consume.
All values are expressed in nanoseconds; for example, a task that must receive 5 ms of CPU time every 16.6 ms can be guaranteed to finish within a 10 ms deadline.
Changing Scheduling Options via Command Line
The chrt command can display ( -p) and set a process’s policy and priority; if no policy is specified, the default is SCHED_RR.
-b Specify SCHED_BATCH
-f Specify SCHED_FIFO
-i Specify SCHED_IDLE
-o Specify SCHED_NORMAL (OTHER)
-r Specify SCHED_RR
-d Specify SCHED_DEADLINEExample: set a new process’s policy and priority.
Changing Scheduling Options via systemd
In a service unit’s [Service] section, set:
CPUSchedulingPolicy – one of other, batch, idle, fifo, rr (deadline not supported).
CPUSchedulingPriority – for real‑time policies, range 1 (lowest) to 99 (highest).
2. CPU Affinity
Pinning Processes
By default the scheduler may run a process on any CPU, but binding a process to specific CPUs improves cache locality and overall performance.
systemd services can set CPUAffinity in the [Service] section, providing a space‑separated list of CPU indexes (e.g., "0 1").
tuna command to view CPU binding
yum install tuna
-t -P (uppercase) to view thread info such as scheduling policy, priority, CPU bindingManaging CPU Affinity with cgroups
On NUMA systems, each node contains its own CPUs and memory; keeping a process on the same node reduces memory latency.
NUMA (Non‑Uniform Memory Access) partitions CPUs and memory into nodes that communicate via QPI.
The cpuset cgroup controller can bind tasks to specific cores. Its configuration files reside under /sys/fs/cgroup and can be edited manually.
cpuset.cpus – list of CPUs a cgroup may use; "-" denotes a range.
cpuset.mems – list of NUMA memory nodes a cgroup may use.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
