Fundamentals 9 min read

Master Linux Scheduling: From CFS to CPU Affinity and systemd

This article explains Linux scheduling strategies, including real‑time and non‑real‑time policies, the CFS and Deadline schedulers, command‑line and systemd methods for setting policies, and techniques for CPU affinity using pinning, NUMA awareness, and cgroup cpuset controls.

Raymond Ops
Raymond Ops
Raymond Ops
Master Linux Scheduling: From CFS to CPU Affinity and systemd

1. Scheduling Strategies

Scheduling Process

A single CPU can execute only one process at a time; Linux achieves multitasking by interleaving processes on the same CPU.

The kernel scheduler decides which process runs at any moment, balancing quick decision making, fairness, interactive responsiveness, and predictability/scalability.

Quickly decide the next process to run

Fair CPU time distribution, with higher‑priority processes receiving more time and pre‑empting lower‑priority ones

Responsive interactive applications

Predictable and scalable under varied workloads

Process Priority

In Linux, the scheduler controls execution order based on each thread or process's scheduling policy and priority, which are divided into real‑time and non‑real‑time strategies.

Scheduling Policies

RHEL provides six scheduling policies, categorized as real‑time and non‑real‑time.

Scheduling policies diagram
Scheduling policies diagram

Real‑time policies

SCHED_FIFO: First‑in‑first‑out without time slicing; runs until I/O blocks or a higher‑priority process pre‑empts it.

SCHED_RR: Uses round‑robin time slices for tasks of equal priority.

Non‑real‑time policies

SCHED_NORMAL (OTHER): Default for most Linux processes.

SCHED_BATCH: Suited for batch‑type workloads.

SCHED_IDLE: Benefits low‑priority background applications.

CFS Scheduling

Since kernel 2.6.23, the Completely Fair Scheduler (CFS) is the default. It manages runnable processes with a red‑black tree based on virtual time; the process with the longest virtual time (i.e., longest wait) receives the CPU, and its virtual time decreases while it runs.

Deadline Scheduling

RHEL 8 introduces SCHED_DEADLINE for real‑time systems, guaranteeing task execution using three parameters: period, deadline, and runtime (worst‑case execution time).

Period – the interval at which a job repeats (e.g., 16 ms for 60 fps video).

Deadline – the latest time by which the job must finish.

Runtime – the maximum CPU time the job may consume.

Deadline scheduling parameters
Deadline scheduling parameters

All values are expressed in nanoseconds; for example, a task that must receive 5 ms of CPU time every 16.6 ms can be guaranteed to finish within a 10 ms deadline.

Changing Scheduling Options via Command Line

The chrt command can display ( -p) and set a process’s policy and priority; if no policy is specified, the default is SCHED_RR.

-b  Specify SCHED_BATCH
-f  Specify SCHED_FIFO
-i  Specify SCHED_IDLE
-o  Specify SCHED_NORMAL (OTHER)
-r  Specify SCHED_RR
-d  Specify SCHED_DEADLINE

Example: set a new process’s policy and priority.

chrt example
chrt example

Changing Scheduling Options via systemd

In a service unit’s [Service] section, set:

CPUSchedulingPolicy – one of other, batch, idle, fifo, rr (deadline not supported).

CPUSchedulingPriority – for real‑time policies, range 1 (lowest) to 99 (highest).

systemd scheduling configuration
systemd scheduling configuration

2. CPU Affinity

Pinning Processes

By default the scheduler may run a process on any CPU, but binding a process to specific CPUs improves cache locality and overall performance.

CPU pinning illustration
CPU pinning illustration

systemd services can set CPUAffinity in the [Service] section, providing a space‑separated list of CPU indexes (e.g., "0 1").

tuna command to view CPU binding
yum install tuna

-t -P (uppercase) to view thread info such as scheduling policy, priority, CPU binding
tuna output
tuna output

Managing CPU Affinity with cgroups

On NUMA systems, each node contains its own CPUs and memory; keeping a process on the same node reduces memory latency.

NUMA (Non‑Uniform Memory Access) partitions CPUs and memory into nodes that communicate via QPI.

NUMA architecture
NUMA architecture

The cpuset cgroup controller can bind tasks to specific cores. Its configuration files reside under /sys/fs/cgroup and can be edited manually.

cpuset.cpus – list of CPUs a cgroup may use; "-" denotes a range.

cpuset.mems – list of NUMA memory nodes a cgroup may use.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

linuxSchedulingcgroupsCFSsystemdCPU affinity
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.