Unlocking Linux TCP: Inside the Kernel’s Socket Lookup and Scaling to Millions of Connections
This article explores Linux’s TCP socket implementation, showing how to adjust system limits, dissecting key kernel structures like sock_common, and walking through the tcp_v4_rcv processing path and inet lookup functions, ultimately revealing methods to scale client connections beyond the traditional 65 535 port limit.
Adjusting System Limits
To allow a larger range of local ports, the kernel parameter net.ipv4.ip_local_port_range can be changed, e.g.:
echo "5000 65000" > /proc/sys/net/ipv4/ip_local_port_rangeFile descriptor limits can also be raised to support many concurrent sockets:
# echo 200000 > /proc/sys/fs/file-max
# vi /etc/sysctl.conf
fs.nr_open=210000
# sysctl -p
# vi /etc/security/limits.conf
* soft nofile 200000
* hard nofile 200000Note: the hard limit in limits.conf cannot exceed nr_open , so nr_open must be increased first, preferably in sysctl.conf , to avoid startup failures.
Key Kernel Structures
The central data structure for a socket is struct sock_common defined in include/net/sock.h. It contains two unions that store the IP address pair and the port pair of a TCP connection:
struct sock_common {
union {
__addrpair skc_addrpair; // IP pair
struct {
__be32 skc_daddr;
__be32 skc_rcv_saddr;
};
};
union {
__portpair skc_portpair; // port pair
struct {
__be16 skc_dport;
__u16 skc_num;
};
};
...
};Thus skc_addrpair records the IP pair and skc_portpair records the port pair of a TCP connection.
TCP Packet Reception Path
When a network packet reaches the NIC, it passes through DMA, hard‑interrupt, soft‑interrupt processing and finally lands in the socket’s receive queue. The entry point for TCP processing is tcp_v4_rcv (file net/ipv4/tcp_ipv4.c).
int tcp_v4_rcv(struct sk_buff *skb) {
...
th = tcp_hdr(skb); // get TCP header
iph = ip_hdr(skb); // get IP header
sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest);
...
}Inet Lookup Functions
The lookup routine first tries to find an established socket via __inet_lookup_established. If none is found, it falls back to __inet_lookup_listener:
static inline struct sock *__inet_lookup_skb(struct net *net,
struct inet_hashinfo *hashinfo,
const __be32 saddr, const __be16 sport,
const __be32 daddr, const __be16 dport,
int dif) {
u16 hnum = ntohs(dport);
struct sock *sk = __inet_lookup_established(net, hashinfo,
saddr, sport, daddr, hnum, dif);
return sk ? sk : __inet_lookup_listener(net, hashinfo, saddr, sport,
daddr, hnum, dif);
}
struct sock *__inet_lookup_established(struct net *net,
struct inet_hashinfo *hashinfo,
const __be32 saddr, const __be16 sport,
const __be32 daddr, const u16 hnum,
const int dif) {
const __portpair ports = INET_COMBINED_PORTS(sport, hnum);
unsigned int hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
unsigned int slot = hash & hashinfo->ehash_mask;
struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
sk_nulls_for_each_rcu(sk, node, &head->chain) {
if (sk->sk_hash != hash)
continue;
if (likely(INET_MATCH(sk, net, acookie, saddr, daddr, ports, dif))) {
if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt)))
goto begintw;
if (unlikely(!INET_MATCH(sk, net, acookie, saddr, daddr, ports, dif))) {
sock_put(sk);
goto begin;
}
goto out;
}
}
return NULL;
}The macro INET_MATCH compares the packet’s source/destination IPs and ports with the socket’s stored values, as well as device binding and network namespace:
#define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif) \
((inet_sk(__sk)->inet_portpair == (__ports)) && \
(inet_sk(__sk)->inet_daddr == (__saddr)) && \
(inet_sk(__sk)->inet_rcv_saddr == (__daddr)) && \
(!(__sk)->sk_bound_dev_if || ((__sk)->sk_bound_dev_if == (__dif))) && \
net_eq(sock_net(__sk), (__net)))Practical Observations
On a Red Hat 6.2 system the author observed over one million established TCP connections:
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)
# ss -ant | grep ESTAB | wc -l
1000013
# cat /proc/meminfo
MemTotal: 3925408 kB
MemFree: 97748 kB
Buffers: 35412 kB
Cached: 119600 kB
Slab: 3241528 kB
...Scaling Strategies
Because each client connection consumes a local port, the apparent limit of ~65 000 connections can be bypassed in two ways:
Assign multiple IP addresses to the client machine, giving each IP its own port space.
Connect to many different server endpoints, each using its own four‑tuple.
Do not mix the two approaches; when an IP is bound with bind() , the kernel will no longer reuse ports on that IP, altering the port‑selection strategy.
Experiments confirmed that a client can handle over a million concurrent TCP connections, demonstrating that the 65 535 port ceiling is not an absolute barrier.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
