Fundamentals 15 min read

Process Management & Scheduling (Part 0): Essential Kernel Structures

This article introduces the core Linux kernel data structures involved in process management and scheduling—task_struct, sched_entity, rq, and sched_avg—explaining their key fields, relationships, and how they enable the kernel to track process state, timing, memory, and load‑balancing decisions.

Linux Kernel Journey
Linux Kernel Journey
Linux Kernel Journey
Process Management & Scheduling (Part 0): Essential Kernel Structures

The series aims to dissect Linux process management and scheduling by first covering the prerequisite kernel structures. It focuses on four central structs: task_struct, sched_entity, rq, and sched_avg, each of which encapsulates specific aspects of a process’s lifecycle and the scheduler’s operation.

task_struct

task_struct

is the Linux PCB (Process Control Block). In Linux 6.5 it spans over 800 lines and stores all information about a process, including identifiers, scheduling data, memory descriptors, file descriptors, and runtime statistics. Important fields highlighted are:

struct task_struct {
    /* Process identifiers */
    unsigned int __state;   // process state
    void *stack;            // kernel stack pointer
    refcount_t usage;       // reference count
    unsigned int flags;    // PF_* flags
    unsigned int prio;      // scheduling priority
    unsigned int static_prio;
    unsigned int normal_prio;
    unsigned int rt_priority; // real‑time priority
    struct sched_entity se;    // normal scheduling entity
    struct sched_rt_entity rt; // real‑time entity
    struct sched_dl_entity dl; // deadline entity
    const struct sched_class *sched_class;
    unsigned int policy;      // scheduling policy
    cpumask_t cpus_mask;      // CPU affinity mask
    int exit_state;           // exit status
    int exit_code;            // exit code from exit()
    int exit_signal;          // signal sent to parent on exit
    int pdeath_signal;        // signal sent to child when parent dies
    unsigned long nvcsw;      // voluntary context switches
    unsigned long nivcsw;     // involuntary context switches
    u64 start_time;           // time when scheduled (ns)
    /* ... many other fields omitted ... */
};

Most fields are placed between randomized_struct_fields_start and randomized_struct_fields_end, a compiler‑level randomization to mitigate memory‑corruption attacks.

sched_entity

sched_entity

represents the smallest scheduling unit—either a single process or a scheduling group. It holds load‑balancing weight, run‑queue node, group list node, and timing information such as virtual runtime and execution statistics.

struct sched_entity {
    struct load_weight load;   // weight influencing scheduling decisions
    struct rb_node run_node;   // node in the red‑black tree of the runqueue
    struct list_head group_node; // list node for grouping entities
    unsigned int on_rq;        // whether the entity is on a runqueue
    u64 exec_start;            // start of execution (virtual time)
    u64 sum_exec_runtime;      // total execution time (real time)
    u64 vruntime;              // virtual runtime used for CFS fairness
    u64 prev_sum_exec_runtime;
    u64 nr_migrations;        // number of migrations
    #ifdef CONFIG_FAIR_GROUP_SCHED
    int depth;                 // depth in scheduling hierarchy
    struct sched_entity *parent; // parent entity in a group
    struct cfs_rq *cfs_rq;      // CFS runqueue this entity belongs to
    struct cfs_rq *my_q;       // runqueue owned by this entity/group
    unsigned long runnable_weight;
    #endif
    #ifdef CONFIG_SMP
    struct sched_avg avg;     // load average for the entity
    #endif
};

rq (runqueue)

The rq struct describes a CPU’s generic runqueue, containing basic counters, pointers to the currently running task, idle task, and the three scheduler‑specific queues (CFS, real‑time, deadline). It also stores load‑balancing data such as CPU capacity and balance callbacks.

struct rq {
    raw_spinlock_t __lock;      // protects the runqueue
    unsigned int nr_running;    // number of runnable tasks
    #ifdef CONFIG_NUMA_BALANCING
    unsigned int nr_numa_running;
    unsigned int nr_preferred_running;
    unsigned int numa_migrate_on;
    #endif
    u64 nr_switches;            // context‑switch count
    unsigned int nr_uninterruptible;
    struct task_struct __rcu *curr; // currently running task
    struct task_struct *idle;   // idle task for this CPU
    struct task_struct *stop;   // stop task
    u64 clock;                  // runqueue clock
    struct cfs_rq cfs;          // CFS runqueue
    struct rt_rq rt;           // real‑time runqueue
    struct dl_rq dl;           // deadline runqueue
    unsigned long nr_switches;
    /* Load‑balancing fields */
    struct root_domain *rd;
    struct sched_domain __rcu *sd;
    unsigned long cpu_capacity;
    unsigned long cpu_capacity_orig;
    unsigned char idle_balance;
    int active_balance;
    int cpu;                    // CPU this runqueue belongs to
    int online;                 // CPU online state
    /* ... other fields omitted ... */
};

sched_avg

sched_avg

aggregates load information for both scheduling entities and runqueues, providing metrics such as last update time, load sum, runnable sum, utilization sum, and various averaged values used by the load‑balancing algorithm.

struct sched_avg {
    u64 last_update_time;   // last time the metrics were refreshed
    u64 load_sum;           // accumulated load (decayed over time)
    u64 runnable_sum;       // accumulated runnable load
    u32 util_sum;           // raw CPU utilization (time‑based)
    u32 period_contrib;     // leftover time not forming a full period
    unsigned long load_avg; // quantified load for the entity/queue
    unsigned long runnable_avg; // runnable load (reflects CPU load)
    unsigned long util_avg; // actual CPU utilization (after weighting)
    struct util_est util_est; // utilization estimator
};

These structures together form the “iceberg tip” of the Linux scheduler: task_struct links a process to its memory descriptor ( mm) and file tables, sched_entity tracks its scheduling state, rq organizes runnable entities per CPU, and sched_avg supplies the metrics for load‑balancing decisions.

The article concludes that understanding these core structs is essential before diving deeper into the kernel’s scheduling algorithms and eBPF‑based runtime analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Linuxprocess schedulingtask_structrunqueuekernel structuressched_entity
Linux Kernel Journey
Written by

Linux Kernel Journey

Linux Kernel Journey

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.