Fundamentals 32 min read

Understanding Linux task_struct: The Kernel‑mode Representation of Processes and Threads

This article provides a comprehensive overview of Linux's task_struct, explaining how the kernel represents and manages processes and threads, detailing its internal fields, state handling, scheduling, credentials, memory layout, and special mechanisms such as hung‑task detection.

Deepin Linux
Deepin Linux
Deepin Linux
Understanding Linux task_struct: The Kernel‑mode Representation of Processes and Threads

Linux uses a unified kernel‑mode structure called task_struct to represent both processes and threads. This structure holds all information required for the kernel to schedule, manage resources, and coordinate inter‑process communication.

1. Linux Task and Kernel Mode

The kernel runs in a privileged mode where it can execute any CPU instruction and access all hardware resources. In this mode, task_struct is the core data structure that organizes all active tasks.

2. Kernel‑mode Layout of task_struct

The source file sched.h defines a task_struct with hundreds of members. Important groups are illustrated below:

task_struct
├── Process Identification
│   ├── pid: process ID
│   ├── tgid: thread‑group ID
│   └── comm: readable name
├── Process State Management
│   ├── state
│   ├── exit_state
│   ├── exit_code
│   ├── jobctl
│   └── atomic_flags
├── Scheduling
│   ├── prio
│   ├── static_prio
│   ├── rt_priority
│   ├── sched_class
│   ├── se
│   ├── rt
│   └── on_rq
├── Memory Management
│   ├── mm
│   └── active_mm
├── File System and I/O Management
│   ├── fs
│   ├── files
│   ├── io_context
│   └── splice_pipe
├── Signal Handling
│   ├── signal
│   ├── sighand
│   ├── pending
│   └── blocked
├── Security and Credentials
│   ├── cred
│   ├── real_cred
│   └── seccomp
├── Inter‑Process Relationships
│   ├── real_parent
│   ├── parent
│   ├── children
│   └── sibling
├── Namespace Management
│   ├── nsproxy
│   └── cgroups
├── Time and Statistics
│   ├── start_time
│   ├── utime
│   ├── stime
│   ├── nvcsw
│   └── nivcsw
├── Thread and CPU State
│   ├── thread
│   ├── recent_used_cpu
│   └── on_cpu
├── Debugging and Tracing
│   ├── ptrace
│   └── last_siginfo
├── Resource Limits and Accounting
│   ├── rlimit
│   └── ioac
├── Error Handling and Recovery
│   ├── task_works
│   └── restart_block
└── Miscellaneous Flags
    ├── PF_EXITING
    ├── PF_VCPU
    └── PF_FORKNOEXEC

3. Process States

Linux defines several task states, each represented by a bit in state :

#define TASK_RUNNING          0
#define TASK_INTERRUPTIBLE    1
#define TASK_UNINTERRUPTIBLE 2
#define __TASK_STOPPED        4
#define __TASK_TRACED         8
#define TASK_DEAD            64
#define TASK_WAKEKILL        128
#define TASK_WAKING          256
#define TASK_PARKED          512
#define TASK_NOLOAD         1024
#define TASK_NEW            2048
#define TASK_STATE_MAX     4096
#define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)

TASK_RUNNING means the task is ready to run; TASK_INTERRUPTIBLE is a shallow sleep that can be awakened by a signal; TASK_UNINTERRUPTIBLE is a deep sleep that ignores signals; other states handle stopping, tracing, and termination.

4. Task IDs

Each task has three identifiers:

pid_t pid;          // process ID
pid_t tgid;         // thread‑group ID (process leader's PID)
struct task_struct *group_leader; // pointer to the leader

If a process has only one thread, pid and tgid are identical. When multiple threads exist, each thread gets its own pid while sharing the same tgid .

5. Parent‑Child Relationships

Every non‑init task has a parent and a real parent, plus lists of children and siblings:

struct task_struct __rcu *real_parent; // actual creator
struct task_struct __rcu *parent;      // receives SIGCHLD
struct list_head children;            // list of child tasks
struct list_head sibling;             // link in parent's child list

6. Flags (PF_*)

Task flags start with PF_ and control low‑level behavior:

#define PF_EXITING    0x00000004   // task is exiting
#define PF_VCPU       0x00000010   // runs on a virtual CPU
#define PF_FORKNOEXEC 0x00000040   // forked but not yet exec'd

7. Credentials

Credentials determine what a task is allowed to do:

const struct cred __rcu *real_cred; // real UID/GID
const struct cred __rcu *cred;      // effective UID/GID, capabilities, etc.

The cred structure contains real, effective, saved, and filesystem UID/GID fields as well as capability sets ( CAP_CHOWN , CAP_KILL , …).

8. Scheduling Information

Key scheduling fields include priority, scheduling class, and CPU affinity:

int on_rq;                     // on run‑queue?
int prio;                     // dynamic priority
int static_prio;              // static priority
unsigned int rt_priority;    // real‑time priority
const struct sched_class *sched_class;
struct sched_entity se;
struct sched_rt_entity rt;
unsigned int policy;          // scheduling policy
int nr_cpus_allowed;
cpumask_t cpus_allowed;
struct sched_info sched_info;

9. Signal Handling

Signal‑related members record blocked, pending, and currently handled signals:

struct signal_struct *signal;
struct sighand_struct *sighand;
sigset_t blocked;
sigset_t pending;
sigset_t real_blocked;
sigset_t saved_sigmask;
unsigned long sas_ss_sp;
size_t sas_ss_size;
unsigned int sas_ss_flags;

10. Memory Management

Each task points to its memory descriptor ( mm_struct ) and, for kernel threads, an active_mm that references the previous user task’s address space:

struct mm_struct *mm;
struct mm_struct *active_mm;

11. Kernel Stack

Every task has a dedicated kernel stack. The size is defined by THREAD_SIZE (8 KB on 32‑bit, 16 KB on 64‑bit). The stack is accessed via helper functions:

static inline void *task_stack_page(const struct task_struct *task) {
    return task->stack;
}
#define task_pt_regs(task) ({
    unsigned long __ptr = (unsigned long)task_stack_page(task);
    __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;
    ((struct pt_regs *)__ptr) - 1;
})

The pt_regs structure stores the CPU registers saved on entry to kernel mode; its layout differs between 32‑bit and 64‑bit architectures.

12. Hung‑Task Mechanism

The kernel launches a watchdog thread khungtaskd every 120 seconds. It scans all tasks for the TASK_UNINTERRUPTIBLE state combined with unchanged context‑switch counters ( nvcsw + nivcsw == last_switch_count ) and prints a stack trace for any task that appears stuck.

13. Special State Handling

• Paused (TASK_STOPPED / TASK_TRACED) : a SIGSTOP puts the task into TASK_STOPPED ; SIGCONT resumes it unless it is being traced ( TASK_TRACED ), which requires the debugger to release it. • Zombie (EXIT_ZOMBIE / EXIT_DEAD) : after termination the task becomes a zombie, retaining only its task_struct . The parent must call wait() to reap it, or the init process will adopt and reap it. 14. Kernel‑mode vs User‑mode Kernel mode can execute any instruction and access all hardware, while user mode is restricted to a sandboxed address space. Kernel stacks live in task_struct->stack and are not shared between tasks; user‑mode stacks and heaps reside in the process’s virtual memory described by mm_struct . Memory allocation in kernel mode uses kmalloc() (slab allocator), whereas user mode uses malloc() and the standard heap. 15. Conclusion The task_struct is the cornerstone of Linux’s process management. Its rich set of fields enables precise control over scheduling, memory, I/O, signals, credentials, and debugging. Understanding this structure is essential for kernel developers, system programmers, and anyone who needs to reason about Linux’s internal behavior.

kernelSchedulerprocess managementLinuxOperating Systemtask_struct
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.