Fundamentals 16 min read

Why Is the Linux Kernel TCP/IP Stack Hard to Scale Compared to User‑Space Stacks?

The article examines the scalability limitations of the Linux kernel TCP/IP stack, comparing its packet‑processing and connection‑setup performance with user‑space stacks such as mTCP and F‑Stack, explains how hash‑table locking and spin‑lock contention cause poor CPS scaling, and argues why user‑space implementations often achieve higher throughput despite their own trade‑offs.

Liangxu Linux

Nov 2, 2023

Why Is the Linux Kernel TCP/IP Stack Hard to Scale Compared to User‑Space Stacks?

Background

Network engineers compare Linux kernel TCP/IP stack performance with user‑space stacks (mtcp using netmap and Tencent’s F‑Stack) to understand why the kernel stack shows poor scalability in packets‑per‑second (PPS) and connections‑per‑second (CPS).

Test Methodology

Both stacks were exercised while increasing the number of CPU cores. PPS was measured for packet reception and CPS for new TCP connections.

The kernel PPS curve flattens quickly, while user‑space stacks improve with core count but still do not scale linearly.

Root Cause of Poor Kernel Scaling

Linux maintains two global hash tables: one for listening sockets and one for established connections. Each CPU may need to access these tables, requiring locks. Modern kernels lock at the hash‑slot level, but when many TCP SYN packets hash to the same slot, contention remains high.

When a server listens on a single port, the listener‑slot lock effectively becomes a global lock, providing no benefit under high connection‑creation rates.

During a TCP connection test the kernel repeatedly inserts a new entry into the establish hash table after the three‑way handshake and removes it when the connection is torn down. The critical path is the inet_unhash() function, which acquires a spinlock before deleting the socket:

void inet_unhash(struct sock *sk) {
    struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
    spinlock_t *lock;
    int done;
    if (sk_unhashed(sk))
        return;
    if (sk->sk_state == TCP_LISTEN)
        lock = &hashinfo->listening_hash[inet_sk_listen_hashfn(sk)].lock;
    else
        lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
    spin_lock_bh(lock);          // root cause
    done = __sk_nulls_del_node_init_rcu(sk);
    if (done)
        sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
    spin_unlock_bh(lock);
}

This spin‑lock serializes access to the hash bucket. As the number of CPU cores grows, the probability that multiple threads contend for the same slot increases, leading to higher lock acquisition frequency.

Impact on CPS Scaling

Each new connection requires a lock acquisition, so the CPS curve becomes concave: adding more cores does not increase the number of successful connections per second. The extra CPU cycles spent in locking cancel out the potential throughput gains.

Why User‑Space Stacks Scale Better

User‑space stacks avoid the kernel’s global hash tables and associated locks. They typically use poll‑based or mmap‑based packet I/O (e.g., netmap) that can be partitioned per core, eliminating the contention point. Consequently they achieve higher PPS and better CPS scaling, at the cost of duplicated protocol logic and separate security handling.

Conclusion

The kernel TCP/IP stack’s poor scalability is mainly caused by hash‑table lock contention that grows with core count. User‑space stacks sidestep this bottleneck and can provide higher throughput for workloads that are limited by connection‑creation rates.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance scalability kernel TCP User Space Network Stack

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.