Understanding Linux TCP Socket Implementation and System Tuning
This article explains how Linux manages TCP sockets at the kernel level, demonstrates how to adjust port ranges and file‑descriptor limits, and shows the key data structures and lookup functions that enable high‑concurrency TCP connections, providing practical guidance for scaling client connections.
The article begins by showing how to change the local port range used by the kernel with echo "5000 65000" > /proc/sys/net/ipv4/ip_local_port_range and how to increase the maximum number of open file descriptors via echo 200000 > /proc/sys/fs/file-max and editing /etc/sysctl.conf and /etc/security/limits.conf .
It then introduces the core socket data structure struct sock_common defined in include/net/sock.h , highlighting the two unions that store the IP address pair and the port pair of a TCP connection.
struct sock_common {
union {
__addrpair skc_addrpair; // IP pair
struct {
__be32 skc_daddr;
__be32 skc_rcv_saddr;
};
};
union {
__portpair skc_portpair; // port pair
struct {
__be16 skc_dport;
__u16 skc_num;
};
};
...
}When a network packet arrives, the kernel processes it through DMA, hard‑interrupt, soft‑interrupt, and finally places it into the socket’s receive queue.
The entry point for TCP packet handling is the function tcp_v4_rcv in net/ipv4/tcp_ipv4.c , which extracts the TCP and IP headers and looks up the corresponding socket using __inet_lookup_skb .
int tcp_v4_rcv(struct sk_buff *skb) {
...
th = tcp_hdr(skb); // get TCP header
iph = ip_hdr(skb); // get IP header
sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest);
...
}The lookup is performed by __inet_lookup_established (and fallback to __inet_lookup_listener ) which builds a 32‑bit port key, computes a hash, and walks the hash bucket list to find a matching socket.
struct sock *__inet_lookup_established(struct net *net,
struct inet_hashinfo *hashinfo,
const __be32 saddr, const __be16 sport,
const __be32 daddr, const u16 hnum,
const int dif) {
const __portpair ports = INET_COMBINED_PORTS(sport, hnum);
unsigned int hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
unsigned int slot = hash & hashinfo->ehash_mask;
struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
sk_nulls_for_each_rcu(sk, node, &head->chain) {
if (sk->sk_hash != hash)
continue;
if (likely(INET_MATCH(sk, net, acookie, saddr, daddr, ports, dif))) {
if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt)))
goto begintw;
if (unlikely(!INET_MATCH(sk, net, acookie, saddr, daddr, ports, dif))) {
sock_put(sk);
goto begin;
}
goto out;
}
}
...
}The macro INET_MATCH compares the packet’s source/destination addresses and ports with the socket’s stored values, also checking device binding and network namespace.
#define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif) \
((inet_sk(__sk)->inet_portpair == (__ports)) && \
(inet_sk(__sk)->inet_daddr == (__saddr)) && \
(inet_sk(__sk)->inet_rcv_saddr == (__daddr)) && \
(!(__sk)->sk_bound_dev_if || (__sk)->sk_bound_dev_if == (__dif)) && \
net_eq(sock_net(__sk), (__net)))System information commands (e.g., cat /etc/redhat-release , ss -ant | grep ESTAB | wc -l , cat /proc/meminfo ) illustrate the environment where these limits matter.
In the conclusion, the article emphasizes that each TCP connection consumes a client port, explains why high connection counts (30‑50 k) are often alarming, and offers two ways to increase client‑side concurrency: assigning multiple IP addresses or connecting to multiple distinct servers, warning against mixing the two because binding to a specific IP changes the kernel’s port‑selection strategy.
Finally, the author notes that practical tests have shown client machines can handle over a million concurrent TCP connections when properly tuned.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.