Operations 21 min read

Why Your API Service Hits 200k TIME_WAIT Connections and How to Fix It

This article explains why high‑traffic Linux services can exhaust TCP connections with massive TIME_WAIT and CLOSE_WAIT counts, shows how to diagnose the problem using netstat/ss commands, and provides concrete kernel‑parameter tweaks, connection‑pool strategies, and monitoring scripts to restore stability.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Why Your API Service Hits 200k TIME_WAIT Connections and How to Fix It

Overview

A sudden alert about massive API timeouts turned out to be caused by over 200,000 TIME_WAIT sockets, preventing new connections. The incident motivated a deep dive into the Linux kernel TCP stack, its connection states, relevant sysctl parameters, and production‑grade monitoring practices.

TCP Connection State Quick Review

Understanding the TCP state machine (CLOSED, SYN_SENT, ESTABLISHED, FIN_WAIT, TIME_WAIT, CLOSE_WAIT, etc.) is essential before tuning, because each state requires a different mitigation strategy.

Environment Information

Operating Systems: CentOS 7.x / RHEL 8.x / Ubuntu 20.04+

Kernel Versions: 3.10+, 4.x, 5.x

Applicable Scenarios: high‑concurrency web services, API gateways, micro‑service architectures

Connection‑State Diagnosis

Quick Diagnostic Commands

# Count connections per state
netstat -ant | awk '{++state[$NF]} END {for(k in state) print k, state[k]}' | sort -k2 -rn
# Faster alternative
ss -ant | awk '{++state[$1]} END {for(k in state) print k, state[k]}' | sort -k2 -rn

Typical output shows the distribution of ESTABLISHED, TIME_WAIT, CLOSE_WAIT, SYN_RECV, etc.

Normal Ranges and Warning Signals

ESTABLISHED : depends on business load; a sudden surge indicates connection leaks.

TIME_WAIT : should stay < 50 k; > 100 k signals short‑connection overload or insufficient port range.

CLOSE_WAIT : normally < 100; continuous growth points to application bugs that never close sockets.

SYN_RECV : < 1 k; a spike usually means a SYN‑Flood attack.

FIN_WAIT2 : < 500; persistent accumulation suggests the peer never sends FIN.

TIME_WAIT Deep Dive

The TIME_WAIT state guarantees that delayed packets from an old connection are not mistakenly accepted by a new one. Linux keeps each TIME_WAIT for 2 MSL (default 60 s). With 10 k short connections per second, 60 s yields 600 k TIME_WAIT sockets, quickly exhausting the default local‑port range (32768‑60999).

# View local port range
cat /proc/sys/net/ipv4/ip_local_port_range
# Check TIME_WAIT count
ss -ant state time-wait | wc -l
# Show some TIME_WAIT entries
ss -ant state time-wait | head -20

CLOSE_WAIT Diagnosis

CLOSE_WAIT

is more dangerous because it indicates that the remote side has closed the connection but the application has not called close(). A buggy Java client that never releases connections can accumulate tens of thousands of CLOSE_WAIT sockets and eventually OOM.

# Find processes holding CLOSE_WAIT sockets
ss -antp state close-wait
# Example output (Java process)
# CLOSE-WAIT  1  0  10.0.0.100:8080 10.0.0.50:45678 users:("java",pid=12345,fd=89)
# Further analysis
lsof -p 12345 | grep CLOSE | wc -l

Kernel Parameter Tuning in Practice

TIME_WAIT Tuning

# /etc/sysctl.conf
# Enable reuse of TIME_WAIT sockets (important)
net.ipv4.tcp_tw_reuse = 1
# Increase bucket limit (how many TIME_WAIT can be kept)
net.ipv4.tcp_max_tw_buckets = 200000
# Reduce FIN timeout (affects FIN_WAIT2 and TIME_WAIT)
net.ipv4.tcp_fin_timeout = 15
# Expand local port range
net.ipv4.ip_local_port_range = 1024 65535

Note: tcp_tw_recycle is deprecated and can break connections behind NAT; avoid enabling it.

Connection‑Establishment Tuning

# Increase SYN backlog (half‑open queue)
net.ipv4.tcp_max_syn_backlog = 65535
# Increase accept queue size
net.core.somaxconn = 65535
# Reduce SYN retry counts
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_synack_retries = 2
# Enable SYN cookies to mitigate SYN‑Flood
net.ipv4.tcp_syncookies = 1

Remember that the effective accept queue size is min(backlog, somaxconn), where backlog is the value passed to listen() by the application.

Keepalive Parameters

# TCP keepalive settings (effective only if SO_KEEPALIVE is set)
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 3

Memory and Buffer Settings

# Memory limits (values are in pages, usually 4 KB each)
net.ipv4.tcp_mem = 262144 524288 1048576
# Per‑socket receive buffer
net.ipv4.tcp_rmem = 4096 87380 16777216
# Per‑socket send buffer
net.ipv4.tcp_wmem = 4096 65536 16777216
# System‑wide limits
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 262144
net.core.wmem_default = 262144

Complete Production Configuration Example

# /etc/sysctl.d/99-tcp-tuning.conf
fs.file-max = 2000000
fs.nr_open = 2000000
net.netfilter.nf_conntrack_max = 2000000
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30
net.netfilter.nf_conntrack_tcp_timeout_established = 1200
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_tw_buckets = 200000
net.ipv4.tcp_fin_timeout = 15
net.ipv4.ip_local_port_range = 1024 65535
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_mem = 786432 1048576 1572864
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.route.gc_timeout = 100
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_timestamps = 1

Apply with sysctl --system and verify with sysctl -a | grep -E "tcp_tw_reuse|somaxconn|tcp_max_syn".

Special‑Scenario Handling

Scenario 1: Short‑Connection API Gateways

Prefer long‑lived connections; configure Nginx upstream keepalive (e.g., keepalive 1000;).

Let the server close first to avoid the client holding TIME_WAIT (adjust keepalive_timeout).

Use HTTP client connection pools in every language.

Scenario 2: Micro‑service Mesh

Monitor inter‑service connection counts with ss -ant | awk '{print $4, $5}' | sort | uniq -c | sort -rn.

Check for missing connection pools, unreasonable pool sizes, or leaks.

Scenario 3: Proxy Servers (Nginx/HAProxy)

Both client and server sides consume ports; double the pressure.

Expand the local port range or use multiple source IPs.

Configure upstream blocks with multiple backend IPs to spread the load.

Monitoring and Alerting

Shell Script for Real‑Time TCP Metrics

#!/bin/bash
while true; do
  ts=$(date '+%Y-%m-%d %H:%M:%S')
  est=$(ss -ant state established | wc -l)
  tw=$(ss -ant state time-wait | wc -l)
  cw=$(ss -ant state close-wait | wc -l)
  sr=$(ss -ant state syn-recv | wc -l)
  echo "$ts ESTABLISHED=$est TIME_WAIT=$tw CLOSE_WAIT=$cw SYN_RECV=$sr"
  if [ $tw -gt 100000 ]; then echo "ALERT: TIME_WAIT exceeds 100000!"; fi
  if [ $cw -gt 1000 ]; then echo "ALERT: CLOSE_WAIT exceeds 1000, check application!"; fi
  sleep 10
done

Prometheus Alert Rules

groups:
- name: tcp_alerts
  rules:
  - alert: HighTimeWaitConnections
    expr: node_netstat_Tcp_TimeWait > 100000
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "TIME_WAIT connections too high"
      description: "Current TIME_WAIT: {{ $value }}"
  - alert: CloseWaitAccumulation
    expr: node_netstat_Tcp_CloseWait > 1000
    for: 10m
    labels:
      severity: critical
    annotations:
      summary: "CLOSE_WAIT accumulation"
      description: "Possible connection leak, current CLOSE_WAIT: {{ $value }}"
  - alert: SynFloodSuspected
    expr: rate(node_netstat_Tcp_SynRecv[1m]) > 10000
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "Possible SYN Flood attack"

Quick‑Check Checklist

# 1. View connection‑state distribution
ss -ant | awk '{++state[$1]} END {for(k in state) print k, state[k]}'
# 2. Look for queue overflows
netstat -s | grep -E "overflow|prune|SYN"
# 3. Inspect local port usage
ss -ant | awk '{print $4}' | grep -oP ':\d+$' | sort | uniq -c | sort -rn | head -10
# 4. Check file‑descriptor usage
cat /proc/sys/fs/file-nr
# 5. If using conntrack, examine its counters
cat /proc/sys/net/netfilter/nf_conntrack_count
cat /proc/sys/net/netfilter/nf_conntrack_max
# 6. Verify current kernel parameters
sysctl -a | grep -E "somaxconn|tcp_max_syn_backlog|tcp_tw|tcp_fin"

Failure Case Reviews

Case 1: E‑commerce Spike – TIME_WAIT Exhaustion

During a Double‑11 sale, API latency jumped from 50 ms to 5 s, with many 504 errors. Diagnostics revealed > 210 k TIME_WAIT sockets and local ports exhausted beyond 65 k.

# Emergency fixes
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
sysctl -w net.ipv4.tcp_fin_timeout=15
# Long‑term: enable keepalive connections in Nginx upstream

Case 2: Java Application – CLOSE_WAIT Leak

After a week of running, a Java service OOM‑crashed. Investigation showed > 85 k CLOSE_WAIT sockets, all pointing to a Redis client pool that failed to return connections on error.

// Correct usage with try‑with‑resources
try (Jedis jedis = jedisPool.getResource()) {
    // use jedis
} // automatically returns to pool

Case 3: SYN Flood Attack

New connections stopped establishing while existing ones stayed alive. SYN_RECV count hit the backlog limit (65 k). The source IPs were few, indicating a targeted SYN‑Flood.

# Emergency mitigation
sysctl -w net.ipv4.tcp_syncookies=1
sysctl -w net.ipv4.tcp_max_syn_backlog=262144
# Rate‑limit SYN packets at the firewall
iptables -A INPUT -p tcp --syn -m limit --limit 100/s --limit-burst 200 -j ACCEPT
iptables -A INPUT -p tcp --syn -j DROP

Conclusion

Key Parameter Quick Reference

tcp_tw_reuse : enable TIME_WAIT reuse – set to 1 (short‑connection services).

tcp_max_tw_buckets : upper limit for TIME_WAIT – 200 000 (high‑concurrency).

tcp_fin_timeout : FIN timeout – 15‑30 s (all scenarios).

somaxconn : accept queue size – 65 535 (high‑concurrency).

tcp_max_syn_backlog : SYN backlog – 65 535 (high‑concurrency).

ip_local_port_range : local port range – 1024‑65535 (proxy servers).

Tuning Principles

Monitor first; base changes on data.

Change one parameter at a time and observe the effect.

Backup original configurations before any modification.

Test on the target kernel version; behavior may differ.

Application‑level optimizations (keep‑alive, connection pools) outweigh kernel tweaks.

Advanced Directions

Use eBPF for deep TCP analysis.

Tune congestion‑control algorithms (BBR, CUBIC).

Explore user‑space stacks like DPDK.

Specialize tuning for container networking.

References

"TCP/IP Illustrated, Volume 1" – W. Richard Stevens

Linux kernel documentation: https://www.kernel.org/doc/Documentation/networking/

Red Hat Performance Tuning Guide: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/

Brendan Gregg’s Linux performance analysis site: http://www.brendangregg.com/linuxperf.html

MonitoringPerformanceTCPNetwork Tuning
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.