Operations 34 min read

How to Optimize Linux Thread Stack Memory for High‑Concurrency Services

This article explains the fundamentals of Linux thread stack memory, identifies why default stack sizes can cause waste or overflow in high‑concurrency scenarios, and provides practical techniques—including stack‑size tuning, code refactoring, and memory‑mapping—to reduce memory usage and improve service stability.

Deepin Linux

Dec 25, 2025

How to Optimize Linux Thread Stack Memory for High‑Concurrency Services

Memory Basics: Stack vs. Heap

Thread stack memory is a temporary region allocated for each thread to store local variables, function parameters, return addresses, and context information. It works like a worker’s toolbox that is cleared after each task, while the heap is a shared pool for dynamic allocations.

Relationship with Process Memory

A Linux process has a single address space divided into code, data, heap, and stack segments. All threads share code, data, and heap, but each thread has its own independent stack, preventing interference between threads.

Linux Thread Stack Internals

Implementation Mechanism

Threads are created via the POSIX pthread_create API, which ultimately invokes the clone system call in the kernel. The clone flags determine which resources are shared between the parent and the new thread.

Stack Creation and Allocation

When a thread is created, the kernel allocates a contiguous memory region for its stack. If the programmer does not specify a size, the default (commonly 2 MiB–8 MiB) is used. The macros ALLOCATE_STACK and the function allocate_stack compute the required size and obtain the memory from the heap.

#include <pthread.h>
#define STACK_SIZE (1024 * 1024) // 1 MiB
void *thread_func(void *arg) { return NULL; }
int main() {
    pthread_t thread;
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setstacksize(&attr, STACK_SIZE);
    int ret = pthread_create(&thread, &attr, thread_func, NULL);
    if (ret != 0) perror("pthread_create");
    pthread_join(thread, NULL);
    pthread_attr_destroy(&attr);
    return 0;
}

Causes of Thread‑Stack Memory Consumption

Default Stack Size Impact

Using the default 8 MiB stack for hundreds of threads can quickly exhaust memory (e.g., 100 threads × 8 MiB = 800 MiB). In memory‑constrained servers this leads to severe performance degradation.

System Calls and Stack Needs

System calls such as gettimeofday allocate temporary structures on the stack; insufficient stack space can cause crashes.

Thread‑Local Storage (TLS)

Each thread’s TLS data occupies stack space; large TLS structures multiplied by many threads become a hidden memory cost.

Function Call Depth and Local Variables

Deep recursion or functions with large local arrays increase stack usage. Example of recursive overflow:

void infinite_recursion() { infinite_recursion(); }
int main() { infinite_recursion(); return 0; }

Example of a large local array:

void large_stack_usage() {
    char large_array[1024 * 1024]; // 1 MiB
    int a, b, c;
}

Risks of Improper Stack Configuration

Stack Overflow

When a thread exceeds its allocated stack, the program crashes (segmentation fault) and can bring down the entire process, especially in multithreaded servers.

Memory Waste

Over‑provisioned stacks leave large unused regions. For 1 000 threads with a 2 MiB default stack but only 100 KiB actual usage, roughly 1.8 GiB are wasted.

Optimization Techniques

Adjust Thread Stack Size

Use ulimit -s for temporary changes or programmatically call setrlimit(RLIMIT_STACK,…):

#include <sys/resource.h>
int main() {
    struct rlimit rl;
    getrlimit(RLIMIT_STACK, &rl);
    rl.rlim_cur = 4 * 1024 * 1024; // 4 MiB
    setrlimit(RLIMIT_STACK, &rl);
    return 0;
}

When using pthreads, set the stack size via pthread_attr_setstacksize as shown earlier.

Code Optimizations

Reduce large local variables : Move big buffers to the heap or declare them static if their lifetime exceeds the function call.

Avoid deep recursion : Replace recursive algorithms with iterative loops or tail‑recursion where the compiler can optimize.

int fibonacci_iterative(int n) {
    if (n == 0) return 0;
    if (n == 1) return 1;
    int a = 0, b = 1, result;
    for (int i = 2; i <= n; ++i) {
        result = a + b;
        a = b;
        b = result;
    }
    return result;
}

Memory‑Mapping (mmap)

Map shared files into the process address space so multiple threads can read the same data without copying it onto each stack.

#include <stdio.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
int main() {
    int fd = open("shared_file.txt", O_RDONLY);
    struct stat sb;
    fstat(fd, &sb);
    char *map = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    close(fd);
    printf("%s", map);
    munmap(map, sb.st_size);
    return 0;
}

Memory Pool (Optional)

Pre‑allocate fixed‑size blocks to avoid frequent malloc/free and reduce fragmentation. This technique is useful for high‑throughput logging or buffer reuse.

Real‑World Case Study: Optimizing a High‑Traffic Network Service

Problem Background

A multithreaded Linux server handling thousands of concurrent connections suffered from high memory usage, frequent swapping, and slow response times.

Analysis

Tools such as top, ps, gdb (stack backtraces), and valgrind revealed deep recursion and oversized local variables as primary contributors.

Optimization Steps

Reduced default thread stack from 2 MiB to 512 KiB using pthread_attr_setstacksize.

Rewrote recursive functions (e.g., Fibonacci) into iterative versions.

Scoped large temporaries to inner blocks or moved them to the heap.

Introduced mmap for shared file access across threads.

Results

Memory usage dropped from >90 % to ~45 % of total RAM, swap usage became negligible, average response time fell from >5 s to <1 s, and throughput increased five‑fold (≈500 req/s).

memory optimization C++Linux POSIX Thread Stack

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.