Boost Linux Performance: Master CPU Affinity and Scheduling
This article explains the fundamentals of Linux CPU affinity, compares soft and hard binding, shows how to use taskset, sched_setaffinity, numactl and cgroup cpuset, and provides real‑world case studies for servers, game frameworks, Nginx and Hadoop clusters.
1. Overview of Linux CPU Affinity
CPU affinity binds a process or thread to specific logical CPUs, reducing migration, cache misses, and context‑switch overhead. It is especially beneficial on NUMA systems where remote memory latency is higher.
2. Kernel Mechanism
Each task is represented by task_struct containing a cpus_allowed bitmask (one bit per logical CPU). The system call
int sched_setaffinity(pid_t pid, size_t cpusetsize, const cpu_set_t *mask)modifies this mask. The call chain is:
sys_sched_setaffinity()
└─> sched_setaffinity()
└─> set_cpus_allowed()
└─> migrate_task()3. Practical Tools
3.1 taskset
Show current affinity: taskset -p <PID> Start a program on CPUs 0 and 1: taskset -c 0,1 ./my_program Change affinity of a running process:
taskset -p -c 2,3 <PID>3.2 Direct use of sched_setaffinity (C)
#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
int main(){
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(2, &cpuset); // bind to CPU 2
if (sched_setaffinity(0, sizeof(cpuset), &cpuset) == -1) {
perror("sched_setaffinity");
exit(EXIT_FAILURE);
}
return 0;
}The same API can be used with pthread_setaffinity_np for individual threads.
3.3 numactl (NUMA‑aware binding)
numactl -N 0 -C 0-3 ./memory_intensive_appOption -N selects the NUMA node, -C selects CPUs within that node, ensuring both CPU and memory locality.
3.4 cgroup cpuset subsystem
Mount the cpuset controller:
mkdir -p /sys/fs/cgroup/cpuset
mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpusetCreate a cgroup and define allowed CPUs and memory nodes:
mkdir /sys/fs/cgroup/cpuset/my_cgroup
echo 0-1 > /sys/fs/cgroup/cpuset/my_cgroup/cpuset.cpus
echo 0 > /sys/fs/cgroup/cpuset/my_cgroup/cpuset.memsAdd a process to the cgroup:
echo 1234 > /sys/fs/cgroup/cpuset/my_cgroup/tasks4. Real‑World Use Cases
4.1 Server Performance (AMD EPYC 7763, 128 cores)
Using lscpu --topology, numactl --hardware, and taskset to bind critical services to specific cores and NUMA nodes eliminated frequent migrations and improved cache locality. A dynamic script ( auto_affinity.sh) adjusted affinity based on CPU load, achieving a three‑fold performance increase.
#!/bin/bash
get_cpu_usage(){ top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1; }
adjust_affinity(){
local pid=$1
local usage=$(get_cpu_usage)
if (( $(echo "$usage > 80" | bc -l) )); then
taskset -cp 0-15 $pid
else
taskset -cp 0-3 $pid
fi
}
for pid in $(pgrep -f "nginx|mysql|redis"); do
adjust_affinity $pid
done4.2 Skynet Game Framework
Binding workers to CPUs with taskset -c 0-3 ./skynet and re‑weighting thread groups reduced CPU usage and latency by >30%.
// before
static int weight[] = { -1,-1,-1,-1, 0,0,0,0, 1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3 };
// after
static int weight[] = { 2,2,2,2, // network threads
1,1,1,1, // business logic
0,0 }; // low‑priority4.3 Nginx Web Server
worker_processes 4;
worker_cpu_affinity 0001 0010 0100 1000;Each worker is pinned to a distinct core, improving cache reuse and reducing context switches under heavy traffic.
4.4 Hadoop MapReduce
Binding map/reduce tasks to the NUMA node that holds the data blocks with numactl --cpunodebind=0 --membind=0 together with taskset -c 0-3 lowers network latency and increases job throughput.
5. Caveats and Best Practices
5.1 Load Balancing
Over‑binding creates hot spots. Monitor per‑core load with top or htop and re‑assign processes when a core exceeds a threshold.
5.2 Hardware Awareness
Prefer cores that share the same L3 cache or belong to the same NUMA node to maximise cache reuse. On NUMA systems, bind both CPU and memory to the local node.
5.3 Avoid Over‑Binding
Do not concentrate many CPU‑intensive processes on a single core. Spread workloads evenly and adjust with taskset -cp as needed.
5.4 Application Suitability
CPU‑bound workloads (e.g., scientific computing, AI training) benefit from aggressive binding. I/O‑bound services should bind only the threads handling network or disk I/O, leaving compute threads free to run on any idle core.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
