Fundamentals 8 min read

Unlocking Linux TCP: Inside the Kernel’s Socket Lookup and Scaling to Millions of Connections

This article explores Linux’s TCP socket implementation, showing how to adjust system limits, dissecting key kernel structures like sock_common, and walking through the tcp_v4_rcv processing path and inet lookup functions, ultimately revealing methods to scale client connections beyond the traditional 65 535 port limit.

ITPUB

Jan 6, 2021

Unlocking Linux TCP: Inside the Kernel’s Socket Lookup and Scaling to Millions of Connections

Adjusting System Limits

To allow a larger range of local ports, the kernel parameter net.ipv4.ip_local_port_range can be changed, e.g.:

echo "5000 65000" > /proc/sys/net/ipv4/ip_local_port_range

File descriptor limits can also be raised to support many concurrent sockets:

# echo 200000 > /proc/sys/fs/file-max
# vi /etc/sysctl.conf
fs.nr_open=210000
# sysctl -p
# vi /etc/security/limits.conf
*  soft  nofile  200000
*  hard  nofile  200000

Note: the hard limit in limits.conf cannot exceed nr_open , so nr_open must be increased first, preferably in sysctl.conf , to avoid startup failures.

Key Kernel Structures

The central data structure for a socket is struct sock_common defined in include/net/sock.h. It contains two unions that store the IP address pair and the port pair of a TCP connection:

struct sock_common {
    union {
        __addrpair skc_addrpair; // IP pair
        struct {
            __be32 skc_daddr;
            __be32 skc_rcv_saddr;
        };
    };
    union {
        __portpair skc_portpair; // port pair
        struct {
            __be16 skc_dport;
            __u16  skc_num;
        };
    };
    ...
};

Thus skc_addrpair records the IP pair and skc_portpair records the port pair of a TCP connection.

TCP Packet Reception Path

When a network packet reaches the NIC, it passes through DMA, hard‑interrupt, soft‑interrupt processing and finally lands in the socket’s receive queue. The entry point for TCP processing is tcp_v4_rcv (file net/ipv4/tcp_ipv4.c).

int tcp_v4_rcv(struct sk_buff *skb) {
    ...
    th  = tcp_hdr(skb);   // get TCP header
    iph = ip_hdr(skb);    // get IP header
    sk  = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest);
    ...
}

Inet Lookup Functions

The lookup routine first tries to find an established socket via __inet_lookup_established. If none is found, it falls back to __inet_lookup_listener:

static inline struct sock *__inet_lookup_skb(struct net *net,
                                            struct inet_hashinfo *hashinfo,
                                            const __be32 saddr, const __be16 sport,
                                            const __be32 daddr, const __be16 dport,
                                            int dif) {
    u16 hnum = ntohs(dport);
    struct sock *sk = __inet_lookup_established(net, hashinfo,
                                                saddr, sport, daddr, hnum, dif);
    return sk ? sk : __inet_lookup_listener(net, hashinfo, saddr, sport,
                                           daddr, hnum, dif);
}

struct sock *__inet_lookup_established(struct net *net,
                                      struct inet_hashinfo *hashinfo,
                                      const __be32 saddr, const __be16 sport,
                                      const __be32 daddr, const u16 hnum,
                                      const int dif) {
    const __portpair ports = INET_COMBINED_PORTS(sport, hnum);
    unsigned int hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
    unsigned int slot = hash & hashinfo->ehash_mask;
    struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
    sk_nulls_for_each_rcu(sk, node, &head->chain) {
        if (sk->sk_hash != hash)
            continue;
        if (likely(INET_MATCH(sk, net, acookie, saddr, daddr, ports, dif))) {
            if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt)))
                goto begintw;
            if (unlikely(!INET_MATCH(sk, net, acookie, saddr, daddr, ports, dif))) {
                sock_put(sk);
                goto begin;
            }
            goto out;
        }
    }
    return NULL;
}

The macro INET_MATCH compares the packet’s source/destination IPs and ports with the socket’s stored values, as well as device binding and network namespace:

#define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif) \
    ((inet_sk(__sk)->inet_portpair == (__ports)) && \
     (inet_sk(__sk)->inet_daddr == (__saddr)) && \
     (inet_sk(__sk)->inet_rcv_saddr == (__daddr)) && \
     (!(__sk)->sk_bound_dev_if || ((__sk)->sk_bound_dev_if == (__dif))) && \
     net_eq(sock_net(__sk), (__net)))

Practical Observations

On a Red Hat 6.2 system the author observed over one million established TCP connections:

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)
# ss -ant | grep ESTAB | wc -l
1000013
# cat /proc/meminfo
MemTotal:        3925408 kB
MemFree:           97748 kB
Buffers:           35412 kB
Cached:           119600 kB
Slab:            3241528 kB
...

Scaling Strategies

Because each client connection consumes a local port, the apparent limit of ~65 000 connections can be bypassed in two ways:

Assign multiple IP addresses to the client machine, giving each IP its own port space.

Connect to many different server endpoints, each using its own four‑tuple.

Do not mix the two approaches; when an IP is bound with bind() , the kernel will no longer reuse ports on that IP, altering the port‑selection strategy.

Experiments confirmed that a client can handle over a million concurrent TCP connections, demonstrating that the 65 535 port ceiling is not an absolute barrier.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance kernel TCP Linux networking

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.