Fundamentals 10 min read

How Linux Implements Thread Stacks: Inside NPTL and glibc

This article explains why Linux gives each thread its own stack, describes the historical evolution from LinuxThreads to NPTL, and walks through the glibc implementation that allocates, sizes, and releases thread‑stack memory using mmap and related kernel calls.

Refining Core Development Skills

Jul 22, 2025

How Linux Implements Thread Stacks: Inside NPTL and glibc

Hello, I'm Fei! After a break to study GPU topics, I'm back to share the low‑level principles of Linux memory, focusing on thread stacks.

Unlike process stacks, which are created at program start, each thread needs an independent stack to avoid conflicts during concurrent execution, so Linux implements thread stacks separately.

1. The NPTL Story

Linux originally had no concept of threads; the kernel’s clone system call only creates lightweight processes sharing an address space. Early user‑space projects like LinuxThreads attempted to emulate threads but lacked POSIX compliance and had many shortcomings.

Two major efforts followed: IBM’s NGPT and Red Hat’s NPTL. NGPT was abandoned, leaving NPTL as the standard. Modern pthread_create calls use NPTL, and the source files you’ll see are under the nptl directory.

2. Linux Thread Implementation

Thread creation involves two parts:

User‑space glibc library where pthread_create is implemented.

Kernel‑space clone system call that creates a lightweight process sharing the address space.

The thread stack lives in the user‑space part. The call chain is

pthread_create → __pthread_create_2_1 → ALLOCATE_STACK → create_thread (clone)

2.1 struct pthread

The core data structure is struct pthread, which stores the thread ID, a pointer to the stack memory ( stackblock), and its size ( stackblock_size).

struct pthread {
    pid_t tid;
    ...
    void *stackblock;
    size_t stackblock_size;
};

2.2 Determining Stack Size

ALLOCATE_STACK

eventually calls allocate_stack, which picks the stack size from attr->stacksize or falls back to __default_stacksize. The default is computed in init.c based on ulimit values, using ARCH_STACK_DEFAULT_SIZE (32 MiB) or PTHREAD_STACK_MIN (16 KiB) when limits are unreasonable.

static int allocate_stack(const struct pthread_attr *attr, struct pthread **pdp, ...){
    size = attr->stacksize ?: __default_stacksize;
    ...
}

2.3 Allocating the User Stack

If no cached stack is available, glibc uses mmap to reserve an anonymous memory region, aligns it, and places the struct pthread at the high end. The allocated region becomes the thread’s stack.

mem = mmap(NULL, size, prot, MAP_PRIVATE|MAP_ANONYMOUS|ARCH_MAP_FLAGS, -1, 0);
pd = (struct pthread *)(((uintptr_t)mem + size - coloring - __static_tls_size) & ~__static_tls_align_m1) - TLS_PRE_TCB_SIZE;
pd->stackblock = mem;
pd->stackblock_size = size;

2.4 Releasing the Stack

When a thread exits, glibc removes its stack from the stack_used list and places it into a cache ( stack_cache) for future reuse, eventually freeing it with munmap if not cached.

3. Summary

Each thread requires an independent stack, which Linux provides via mmap in user space, distinct from the process’s default stack. The glibc NPTL library manages both the user‑space struct pthread (including the stack) and the kernel‑space task_struct. The following diagram illustrates the relationship between process and thread stacks.

With the stack memory in place, local variables in your code have a proper storage area. The article focuses on virtual address space; physical memory is only allocated on demand via page faults handled by the buddy system.