Why Does TCP Send RST? Deep Dive into Causes and Debugging Techniques
This article explains the fundamentals of TCP RST packets, distinguishes active and passive resets, outlines common kernel scenarios that generate them, and provides practical debugging methods using tcpdump, bpftrace, and source‑code analysis to resolve real‑world network incidents.
Background
TCP reset (RST) packets often cause connection drops and are a frequent source of production issues. Understanding the types of RST and how to diagnose them is essential for reliable network operations.
Principle
RST packets are classified as active (initiated by the host itself) or passive (generated in response to unexpected traffic). The classic definition from RFC 793 is shown below.
Active RST
An active reset contains an ACK flag. The kernel function responsible is
tcp_send_active_reset():
<code>tcp_send_active_reset()
-> skb = alloc_skb(MAX_TCP_HEADER, priority);
-> tcp_init_nondata_skb(skb, tcp_acceptable_seq(sk), TCPHDR_ACK | TCPHDR_RST);
-> tcp_transmit_skb()</code>Typical causes:
Application calls
close()while data remains unread.
SO_LINGER is set, forcing an immediate reset.
Insufficient TCP memory or too many orphan sockets, leading the kernel to abort the connection.
Passive RST
A passive reset lacks the ACK flag and its sequence number equals the ACK of the packet it rejects. The kernel function is
tcp_v4_send_reset():
<code>tcp_v4_send_reset()
if (th->ack) {
rep.th.seq = th->ack_seq;
} else {
rep.th.ack = 1;
rep.th.ack_seq = htonl(ntohl(th->seq) + th->syn + th->fin + skb->len - (th->doff << 2));
}</code>Passive resets occur in many more scenarios, generally when the packet cannot be matched to an existing socket.
Tools
For kernel‑level analysis,
tcpdumpnarrows the scope, while eBPF tools such as
bpftracecan capture the call stacks of the reset functions.
Active RST tracing:
<code>sudo bpftrace -e 'k:tcp_send_active_reset { @[kstack()] = count(); }'</code>Passive RST tracing:
<code>sudo bpftrace -e 'k:tcp_v4_send_reset { @[kstack()] = count(); }'</code>Stack traces are then examined to locate the exact kernel context.
Case Studies
1. Close‑stage RST
Using
tcpdumpthe reset was identified as active. The stack trace revealed that the application had set SO_LINGER with a zero timeout, causing an immediate reset on
close().
2. Handshake‑stage RST bug
A race condition between deleting the old socket and inserting the new one caused the server to miss the incoming SYN‑ACK, leading it to treat the packet as unexpected and send an active RST. The fix was to insert the new socket at the tail of the hash bucket before removing the old one.
3. Netfilter/DNAT‑induced RST
DNAT rules redirected traffic to a different port, but early demux selected an existing established socket, causing a passive RST. The solution involved adjusting the early‑demux logic to avoid selecting an established socket when a DNAT rewrite occurs.
Conclusion
RST problems can be systematically resolved by first identifying the reset type, then tracing the relevant kernel function, consulting the RFC, and finally analyzing the source code. Four clear steps—type, trace, reference, and source—are sufficient to eliminate most reset‑related issues.
Tencent Architect
We share technical insights on storage, computing, and access, and explore industry-leading product technologies together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.