Databases 15 min read

Why Some TCP RST Packets Don't Close Your Database Connections (And How to Fix It)

This article explores three real‑world cases of TCP anomalies that affect database connectivity—ineffective RST packets, unexpected port exhaustion despite tcp_tw_reuse, and ghost connections caused by a full accept queue—explaining the underlying Linux kernel mechanics and offering practical mitigation strategies.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Why Some TCP RST Packets Don't Close Your Database Connections (And How to Fix It)

While working on database systems, I encountered several TCP‑related challenges that revealed gaps in my understanding of low‑level networking, prompting a deep dive into three illustrative cases.

Case 1: Not All RST Packets Are Effective

Background : In TCP, a packet with the RST flag is used to abort a connection abruptly, discarding any pending data without requiring an ACK.

Problem : A client repeatedly experienced connection resets when accessing a database. Packet captures showed the client sending an RST (triggered by a local iptables rule), yet the Linux kernel ignored it, allowing data exchange to continue. After about 10 seconds the connection was closed by a cloud load balancer (SLB) that finally cleared the session.

Key Insight : An RST is only valid if its sequence number falls within the receiver’s window (the sliding receive window). If the sequence is outside this window, the kernel discards the RST, so the connection persists until another component (e.g., the SLB) terminates it.

Answer : No, an RST does not always terminate a connection; termination depends on the RST’s sequence number. In the example, the connection was closed after 10 seconds by the SLB, not by the immediate RST.

Case 2: How Many TCP Ports Are Actually Usable on Linux?

Background : Linux has 65 535 possible port numbers, but the effective pool depends on the net.ipv4.ip_local_port_range setting and how ports are allocated (explicit bind() vs. automatic assignment).

Problem : In a distributed database with many nodes, we observed “bind: Address already in use” errors even though tcp_tw_reuse was enabled, suggesting a port shortage.

Analysis : The kernel treats bind(0) (automatic port selection) differently from connect(). bind(0) checks inet_csk_bind_conflict, which excludes ports that are still in TIME_WAIT, while connect() uses __inet_check_established, allowing reuse of TIME_WAIT ports. Consequently, a large number of TIME_WAIT sockets fill the ip_local_port_range pool, causing automatic binds to fail.

Solution : Use explicit port numbers (e.g., bind(8080)) and manage a custom port pool in the application, avoiding reliance on automatic selection that conflicts with TIME_WAIT sockets.

Answer : Linux provides 65 535 ports; those within ip_local_port_range can be allocated automatically, and any port can be reused as long as the four‑tuple (source IP/port, destination IP/port) is unique. Port exhaustion still occurs with bind(0) because it refuses ports stuck in TIME_WAIT.

Case 3: Ghost Connections When the Accept Queue Is Full

Background : TCP’s three‑way handshake is handled by the kernel. The server maintains a half‑connection queue (SYN received) and a full‑connection queue (handshake completed, awaiting accept()).

Problem : In a large cluster (e.g., 320 nodes), some nodes hung during initialization. Tracing revealed that the client’s connect() call returned success, but the server never saw the connection because its full‑connection queue was full. The kernel had already sent a SYN‑ACK, then dropped the connection when the queue overflowed, creating a “ghost” connection.

Observation : On Linux 3.10, the client believes the connection succeeded, while the server discards it. On Linux 4.9, connect() blocks when the queue is full, preventing the ghost connection.

Answer : When the full‑connection queue is full, Linux 3.10 allows connect() to succeed and then silently drops the connection; newer kernels (e.g., 4.9) make connect() block until space is available, avoiding the ghost‑connection issue.

These cases demonstrate that a thorough understanding of Linux networking internals—such as RST validation, port allocation rules, and accept‑queue behavior—is essential for diagnosing and resolving seemingly obscure database connectivity problems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databaseTCPLinuxNetworkingRSTport
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.