How Linux Implements CPU Affinity: From sched_setaffinity to Task Migration
This article explains why binding a process to a specific CPU improves cache performance, shows how to set CPU affinity on Linux using the sched_setaffinity system call, and walks through the kernel's internal implementation—including run‑queue structures, migrate_task, and __migrate_task—illustrated with code and diagrams.
CPU Affinity Overview
Binding a process to a specific CPU (CPU affinity) improves cache locality because each core has private L1/L2 caches while sharing L3. Keeping a process on one core reduces cache misses caused by migrations.
Setting CPU Affinity on Linux
Linux provides the sched_setaffinity system call:
int sched_setaffinity(pid_t pid, size_t cpusetsize, const cpu_set_t *mask);Parameters: pid – target process ID (0 for the calling process). cpusetsize – size of the CPU set bitmap. mask – pointer to a cpu_set_t bitmap where each bit represents a CPU.
Example (bind current process to CPU 2):
#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
int main(void) {
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(2, &cpuset); // bind to CPU2
if (sched_setaffinity(0, sizeof(cpuset), &cpuset) == -1) {
printf("Set CPU affinity failed: %s
", strerror(errno));
return -1;
}
return 0;
}Kernel Implementation
Each CPU has its own run‑queue ( struct rq). The scheduler selects tasks from the run‑queue belonging to the CPU on which the task is placed.
Call chain for setting affinity:
sys_sched_setaffinity()
└─> sched_setaffinity()
└─> set_cpus_allowed()
└─> migrate_task() migrate_taskhandles two cases:
Case 1 : The task is not on any run‑queue (not runnable). It simply updates the task’s cpu field to the destination CPU.
Case 2 : The task is already on a run‑queue. It builds a migration request and wakes the kernel’s migration_thread to move the task.
static int migrate_task(struct task_struct *p, int dest_cpu,
struct migration_req *req)
{
struct rq *rq = task_rq(p);
/* Case 1: task not on any run‑queue */
if (!p->se.on_rq && !task_running(rq, p)) {
set_task_cpu(p, dest_cpu);
return 0;
}
/* Case 2: task on a run‑queue – build request */
init_completion(&req->done);
req->task = p;
req->dest_cpu = dest_cpu;
list_add(&req->list, &rq->migration_queue);
return 1;
}The migration thread eventually calls __migrate_task to relocate the task:
static int __migrate_task(struct task_struct *p,
int src_cpu, int dest_cpu)
{
struct rq *rq_src = cpu_rq(src_cpu);
struct rq *rq_dest = cpu_rq(dest_cpu);
int on_rq = p->se.on_rq;
if (on_rq)
deactivate_task(rq_src, p, 0); // remove from source queue
set_task_cpu(p, dest_cpu);
if (on_rq)
activate_task(rq_dest, p, 0); // insert into destination queue
return 0;
}The cpu_set_t bitmap visualisation:
Task migration diagram (CPU 0 → CPU 3):
Summary
Setting CPU affinity ultimately moves a task into the run‑queue of the chosen CPU. If the task is already queued on another CPU, the kernel migrates it via migration_thread. Because each CPU maintains an independent run‑queue, uneven distribution can cause load imbalance, which may require additional load‑balancing mechanisms.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
