Operations 32 min read

Linux Network Troubleshooting: In‑Depth Guide to tcpdump, netstat and ss

This article walks system administrators and DevOps engineers through a systematic approach to diagnosing Linux network issues, covering the fundamentals of netstat, ss, and tcpdump, interpreting TCP state tables, analyzing packet captures, and resolving common problems such as TIME_WAIT buildup, SYN floods, and HTTPS handshake failures.

Golang Shines
Golang Shines
Golang Shines
Linux Network Troubleshooting: In‑Depth Guide to tcpdump, netstat and ss

Why the tool trio matters

Network incidents often appear as vague service errors. A disciplined workflow that starts with high‑level socket inspection (ss) and drills down to packet‑level evidence (tcpdump) lets you pinpoint the root cause without overwhelming data.

Understanding the TCP state machine

The kernel tracks each connection through states such as LISTEN, SYN_SENT, ESTABLISHED, TIME_WAIT, CLOSE_WAIT, etc. Knowing which side initiates a transition tells you whether the problem lies in the client, the server, or an intermediate device.

Practical command cheat sheet

# Overview of all sockets
ss -s

# Count sockets in a specific state
ss -tan state time-wait | wc -l

# Show sockets with detailed TCP metrics (RTT, cwnd, retrans)
ss -tin

# Filter by remote address or port
ss -tan dst 10.0.1.100
ss -tan 'sport = :80'

# Force‑close problematic sockets (use with care)
ss -K 'dport = :80'
ss -K 'dst 10.0.1.100'

# Capture HTTP traffic to a pcap file
tcpdump -i any -nn -s 0 -w /tmp/cap/http.pcap port 80

# Capture only SYN packets (useful for SYN flood detection)
 tcpdump -i any 'tcp[tcpflags] & tcp-syn != 0 and tcp[tcpflags] & tcp-ack == 0'

Typical scenarios and root‑cause analysis

After gathering the socket snapshot, compare the number of ESTABLISHED connections with TIME_WAIT or CLOSE_WAIT counts. A high TIME_WAIT/ESTABLISHED ratio (>0.5) often indicates short‑lived HTTP calls or aggressive keep‑alive timeouts. A surge in SYN_RECV suggests a possible SYN flood or a saturated half‑open queue.

Case study 1 – TIME_WAIT explosion

A Java/Tomcat service reported 5xx errors. ss -tan state time-wait showed 40 000 sockets in TIME_WAIT while ESTABLISHED approached 60 000. The analysis revealed that a load balancer was closing idle connections, forcing the backend into TIME_WAIT.

Fixes applied:

Enable net.ipv4.tcp_tw_reuse=1 to allow safe reuse of TIME_WAIT sockets.

Reduce the kernel timeout with net.ipv4.tcp_fin_timeout=15.

Adjust application keep‑alive settings to keep connections alive longer.

Case study 2 – SYN_RECV flood (suspected SYN flood)

ss revealed 8 000 sockets stuck in SYN_RECV. A short tcpdump capture filtered for SYN packets showed traffic from hundreds of distinct source IPs, confirming a SYN flood.

Immediate mitigation:

Enable SYN cookies: sysctl -w net.ipv4.tcp_syncookies=1.

Increase the half‑open backlog: sysctl -w net.ipv4.tcp_max_syn_backlog=4096 and sysctl -w net.core.somaxconn=4096.

Rate‑limit SYN packets with iptables:

iptables -I INPUT -p tcp --syn -m limit --limit 10/s --limit-burst 20 -j ACCEPT
iptables -A INPUT -p tcp --syn -j DROP

Case study 3 – CLOSE_WAIT leak

A Java service accumulated 50 000 sockets in CLOSE_WAIT. ss -tanp state close-wait identified the offending process. Thread dumps showed that HTTP client code omitted close() in error paths.

Resolution:

Restart the service to clear the leaked descriptors.

Refactor code to use try‑with‑resources or finally blocks ensuring sockets are always closed.

Advanced packet‑capture examples

To see the exact TLS handshake, capture only port 443 traffic and decode with Wireshark. For HTTP request‑line extraction:

tcpdump -i any -nn -s 0 -A 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' | grep -E '^(GET|POST|HTTP)'

Kernel parameter quick reference

net.core.somaxconn

– accept queue size (default 128, typical 1024‑4096). net.ipv4.tcp_max_syn_backlog – half‑open queue (default 128, raise to 4096 for high traffic). net.ipv4.tcp_syncookies – enable SYN cookie protection (1 = on). net.ipv4.tcp_tw_reuse – allow TIME_WAIT reuse (0 = off, 1 = on for NAT). net.ipv4.tcp_fin_timeout – FIN_WAIT_2 timeout (default 60 s, often set to 15 s). net.ipv4.tcp_keepalive_time – idle before keep‑alive probes (default 7200 s, often reduced to 600 s).

Monitoring and alerting recommendations

Expose the kernel counters via node_exporter and create Prometheus alerts for the following metrics: node_netstat_Tcp_RetransSegs – rate of TCP retransmissions. node_netstat_Tcp_CurrEstab – current ESTABLISHED sockets. node_conntrack_count vs node_conntrack_max – conntrack saturation. node_netstat_Tcp_OutRsts – sudden RST spikes.

Sample alert rule (Prometheus):

groups:
- name: network
  rules:
  - alert: HighTcpRetrans
    expr: rate(node_netstat_Tcp_RetransSegs[5m]) > 100
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "TCP retransmission rate high"
      description: "{{ $labels.instance }} is retransmitting at {{ $value }}/s"

Best‑practice checklist

Start with ss for a quick socket overview before capturing packets.

Always apply a BPF filter to tcpdump to limit data volume.

Correlate socket state counts with kernel metrics (nstat, /proc/net/*).

Adjust sysctl parameters gradually and keep a rollback plan.

Document each investigation step and store pcap files securely.

Automate routine health checks with a small script that runs ss and alerts on abnormal ratios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PerformanceNetworklinuxtroubleshootingsysadmintcpdumpnetstatss
Golang Shines
Written by

Golang Shines

We share daily the latest Golang technical articles, practical resources, language news, tutorials, and real-world projects to help everyone learn and improve.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.