Fundamentals 9 min read

Where Does a Woken Task Run? Exploring wake_affine and select_idle_sibling in the Linux Scheduler

The article analyzes how the Linux kernel decides the CPU for a task that has just been woken, detailing the roles of wake_affine and select_idle_sibling, and examining factors such as cache affinity, idle status, load, and hardware topology that influence the final placement.

Linux Code Review Hub
Linux Code Review Hub
Linux Code Review Hub
Where Does a Woken Task Run? Exploring wake_affine and select_idle_sibling in the Linux Scheduler

Wake‑up affinity decision

When a waking task (waker A) wakes a sleeping task (wakee B) the kernel must select a CPU for B. Because the wake‑up often coincides with data exchange (e.g., A writes to a pipe or shared memory that B reads), placing B on a CPU that is topologically close to A increases the chance of hitting the hot cache written by A. The decision also considers B’s previous CPU ( prev_cpu), A’s current CPU ( this_cpu), their cache‑sharing relationships, and the run‑queue load of each CPU.

Topology example

Assume three clusters – WuChang, HanKou, HanYang – each with four CPUs that share an L2 cache. All twelve CPUs share an L3 cache, so cpus_share_cache() is true for any pair, while cpus_share_resources() is true only within a cluster.

In the example A runs on a WuChang CPU ( this_cpu) and wakes B, which was previously sleeping on a HanKou CPU ( prev_cpu).

Wake‑affine algorithm

The kernel’s wake_affine() function decides whether B should stay on prev_cpu or move toward this_cpu. The choice is based on:

Idle status of this_cpu and prev_cpu. If this_cpu is idle (or will become idle) and prev_cpu is busy, moving B to this_cpu may let B hit A’s hot cache. If prev_cpu is idle, keeping B there may be preferable.

Relative load. If this_cpu has a lighter load than prev_cpu (or only slightly heavier), the algorithm biases toward this_cpu.

Code path

A calls try_to_wake_up(), which invokes select_task_rq(). For the fair scheduling class, select_task_rq_fair() runs and calls wake_affine() to obtain a candidate new_cpu (either this_cpu or prev_cpu). Afterwards select_idle_sibling(p, prev_cpu, new_cpu) attempts to find an idle sibling CPU near either prev_cpu or new_cpu.

wake_affine()

mainly invokes two sub‑functions: wake_affine_idle() and wake_affine_weight().

Idle path: if the wake‑up originates from an interrupt ( sync && cpu_rq(this_cpu)->nr_running == 1), moving B to this_cpu is favored. If prev_cpu is idle, B may stay there.

When neither CPU is idle, the algorithm compares effective loads with a slight bias toward this_cpu:

prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2;
this_eff_load *= 100;

After wake_affine() selects new_cpu, select_idle_sibling() again examines the idle status of prev_cpu and new_cpu. It may also consider recent_used_cpu (the CPU B used most recently) if it is idle and shares cache with the target CPU or belongs to the same cluster.

If no suitable idle sibling is found, select_idle_cpu() scans the cluster of the target CPU first, then other clusters, and finally the whole LLC domain, preferring CPUs that are physically closer according to the hardware topology.

Congestion control

Pulling communicating tasks together improves cache locality, but if many tasks (A, B, C, D, …) are clustered in a small sched_domain, congestion may arise. The kernel monitors wake‑up patterns of the waker and wakee and may disable wake_affine when it would cause overload.

Summary of placement

When A on WuChang wakes B on HanKou, the kernel may migrate B to WuChang or keep it on HanKou. The scanning order always prefers the local region first, then the two other regions, and only falls back to a full LLC scan in slow‑path cases (e.g., WF_EXEC or WF_FORK) that trigger load‑balancing across distant clusters.

Kernelcpu_affinityselect_idle_siblingwake_affine
Linux Code Review Hub
Written by

Linux Code Review Hub

A professional Linux technology community and learning platform covering the kernel, memory management, process management, file system and I/O, performance tuning, device drivers, virtualization, and cloud computing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.