Fundamentals 8 min read

How Linux’s CFS Scheduler Allocates CPU to Containers by Weight

This article explains Linux’s Completely Fair Scheduler, its data structures, how weight‑based CPU allocation works for containers via cgroup v1 and v2, and demonstrates the calculation of vruntime and practical examples of resource distribution across services.

Refining Core Development Skills
Refining Core Development Skills
Refining Core Development Skills
How Linux’s CFS Scheduler Allocates CPU to Containers by Weight

Hello, I’m Fei.

In the previous article we covered the period‑and‑quota method of limiting container CPU time. This article introduces the second method: weight‑based allocation, describing its implementation and underlying principles.

1. Linux’s Completely Fair Scheduler

Before discussing container weight allocation, we review the kernel’s Completely Fair Scheduler (CFS). Each logical CPU has a scheduling queue struct cfs_rq organized as a red‑black tree. Tree nodes are struct sched_entity, which can represent a regular process ( struct task_struct) or a container’s struct cfs_rq.

The core of cfs_rq is the rb_root_cached red‑black tree that stores sched_entity nodes. These entities may belong to ordinary processes or to container task groups.

struct cfs_rq {
    ...
    // Minimum vruntime of all tasks in the queue
    u64 min_vruntime;
    // Red‑black tree of runnable tasks
    struct rb_root_cached tasks_timeline;
    ...
}

Key structures:

struct task_group {
    ...
    struct sched_entity *se;
    struct cfs_rq *cfs_rq;
    unsigned long shares;
}
struct task_struct {
    ...
    struct sched_entity se;
}

Both process and container entities contain a virtual runtime vruntime and a load field that stores weight information.

Each logical CPU runs a periodic timer that triggers scheduling decisions by examining the leftmost node of the red‑black tree. The scheduler strives to keep all entities’ vruntime values roughly equal, ensuring fair CPU distribution.

2. Setting Weights

While CFS enforces fairness via vruntime, services often need more or less CPU. Weight fields enable this differentiation. The relevant structures are:

struct sched_entity {
    struct load_weight load;
    u64 vruntime;
    ...
}

struct load_weight {
    unsigned long weight;
    u32 inv_weight;
};

For regular processes, the nice command indirectly adjusts weight. In containers, weight is set via cpu.shares (cgroup v1) or cpu.weight / cpu.weight.nice (cgroup v2).

static struct cftype cpu_legacy_files[] = {
    {
        .name = "shares",
        .read_u64 = cpu_shares_read_u64,
        .write_u64 = cpu_shares_write_u64,
    },
    ...
};
static struct cftype cpu_files[] = {
    {
        .name = "weight",
        .flags = CFTYPE_NOT_ON_ROOT,
        .read_u64 = cpu_weight_read_u64,
        .write_u64 = cpu_weight_write_u64,
    },
    ...
};

Both write paths eventually call __sched_group_set_shares to store the weight in the scheduling entity.

static int __sched_group_set_shares(struct task_group *tg, unsigned long shares)
{
    tg->shares = shares;
    for_each_possible_cpu(i) {
        struct sched_entity *se = tg->se[i];
        for_each_sched_entity(se)
            update_cfs_group(se);
    }
}

The update_cfs_group function invokes reweight_entity and update_load_set to record the weight in se->load.weight.

static inline void update_load_set(struct load_weight *lw, unsigned long w)
{
    lw->weight = w;
    lw->inv_weight = 0;
}

3. Container CPU Weight Allocation in Practice

CFS maintains fairness by scaling vruntime according to weight. The scaling occurs in calc_delta_fair:

static inline u64 calc_delta_fair(u64 delta, struct sched_entity *se)
{
    if (unlikely(se->load.weight != NICE_0_LOAD))
        delta = __calc_delta(delta, NICE_0_LOAD, &se->load);
    return delta;
}

When weight equals NICE_0_LOAD (1024), vruntime matches real runtime. Otherwise, the kernel computes:

vruntime = (actual_runtime * ((NICE_0_LOAD * 2^32) / weight)) >> 32

A higher weight yields a smaller vruntime, granting more CPU time; a lower weight does the opposite.

Example: on an 8‑core machine running containers A, B, and C with weights 512, 1024, and 2048 respectively, the total weight is 8192. Their CPU shares become:

A gets 512/8192 ≈ 0.5 core

B gets 1024/8192 = 1 core

C gets 2048/8192 = 2 cores

This demonstrates Linux’s second capability for container CPU allocation: weight‑based distribution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LinuxCPU schedulingcgroupcontainersCFSweight allocation
Refining Core Development Skills
Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.