How Linux Implements Processes and Threads: Inside task_struct, thread_info, and Memory Management
This article explains why Linux treats every thread as a process, describes the core kernel data structures task_struct, thread_info and the kernel stack, details the clone‑based creation path, and clarifies virtual address spaces, page tables, and the differences between user processes and kernel threads.
Linux Processes and Threads
In Linux, threads are implemented as processes because the kernel does not have a separate scheduling algorithm or data structure for threads; both share the same task_struct representation.
For a single‑threaded process, the process itself is the sole thread. In a multithreaded program the original process becomes the main thread, and together they form a thread group.
Each process owns its own address space and page tables, while threads share the address space and page tables of their parent process.
The fundamental difference stems from whether a new execution context copies the current address space (process) or shares it (thread) during creation.
Both processes and threads are created via the clone system call, which invokes the kernel function do_fork. do_fork calls copy_process, which performs the following steps:
Allocate and copy a new task_struct for the child.
Create a thread_info structure and a kernel stack for the child.
Assign a new PID to the child task_struct.
Based on the flags passed to clone, decide which resources (open files, filesystem info, signal handlers, address space, etc.) are copied or shared.
Three Core Data Structures
Every process or thread is described by three kernel structures:
struct thread_info struct task_structKernel stack
Although a thread shares the address space with the main thread, each thread still has its own kernel stack.
The thread_info object lives in a 2‑page‑long region in kernel space; the low address holds the thread_info structure itself, and the remaining space is used for the kernel stack. The kernel allocates this region using the slab allocator.
Inside thread_info there is a pointer struct task_struct *task that points to the corresponding task_struct (the process descriptor). The task_struct is allocated with the slab allocator as well.
task_struct Structure
Each process or thread has a unique task_struct that contains the most important fields, such as pointers to memory management structures and identifiers.
Main Elements of task_struct
struct thread_info *thread_info; // basic thread information
struct mm_struct *mm; // user address space and page tables
struct mm_struct *active_mm; // used by kernel threads to access the kernel page tables
struct fs_struct *fs; // filesystem information
struct files_struct *files; // open file descriptors
struct signal_struct *signal; // signal handling informationIdentifiers Inside task_struct
pid: unique process identifier. tid: thread identifier. tgid: thread‑group leader PID (the main thread’s PID). pgid: process‑group leader PID. sid: session leader PID. group_leader: pointer to the task_struct of the thread‑group leader.
Virtual Memory Address Space
Memory Management
The kernel manages memory by dividing physical memory into pages, grouping them into zones, and mapping pages into virtual address spaces.
When the kernel needs memory, it immediately satisfies the request (kernel code runs with the highest priority). After allocation, the memory is mapped into the kernel’s address space (typically the 3‑4 GB region on a 32‑bit system) and the mapping is recorded in the page tables.
Typical kernel allocation functions include vmalloc, kmalloc, alloc_pages, and __get_free_pages.
Why a Virtual Address Space?
Ordinary user processes receive a virtual address space that may be larger than the physical memory available. The kernel can delay physical allocation until a page is actually accessed, which avoids unnecessary memory consumption.
Each 32‑bit user process gets a 4 GB virtual address space, split into a 0‑3 GB user region and a 3‑4 GB kernel region that is shared among all processes.
Threads share the same virtual address space, but each thread has its own user stack and registers; the user heap is shared.
Kernel Threads
Kernel threads run exclusively in kernel address space (3‑4 GB) and share the same kernel page tables. Their task_struct has mm set to NULL.
During creation, the clone system call receives a void *child_stack argument that points to the stack area allocated for the new thread (typically via pthread_allocate_stack which uses mmap).
Init Kernel Thread
The kernel starts with a special process init_task (PID 0). rest_init creates the init kernel thread (PID 1) using kernel_thread. After initializing the kernel space, init execs /sbin/init, becoming a normal user‑space process while retaining PID 1.
kthreadd Kernel Thread
After init_task becomes the idle process, it creates the kthreadd kernel thread (PID 2) via kernel_thread. kthreadd is responsible for spawning and managing other kernel threads using functions such as kthread_create, kthread_run, and kthread_stop.
Kernel daemon threads can be listed with $ ps -efj; those whose command name is enclosed in square brackets (e.g., [kthreadd]) are kernel threads.
Page Tables and active_mm
Each task_struct contains an mm pointer (the process’s own page tables) and an active_mm pointer. For ordinary processes, mm and active_mm point to the same mm_struct. Kernel threads have mm = NULL but keep active_mm pointing to the mm_struct of the last user process, allowing them to access the kernel page tables.
The mm_struct includes reference counters: mm_count: counts references from the owning process and any kernel threads that use the same mm_struct. The structure is freed only when this counter reaches zero. mm_users: counts the number of threads sharing the address space (i.e., the size of the thread group). It does not trigger deallocation.
These counters prevent premature release of page tables when a process exits while kernel threads still need them.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
