Operations 37 min read

Why Are Your TCP Packets Dropping? A 3‑Day Capture Walkthrough to Kernel Parameter Fixes

This guide walks you through diagnosing intermittent TCP packet loss and latency spikes on Linux servers, from initial symptom checks and anti‑pattern warnings to detailed sender‑side, network‑link, and receiver‑side investigations using tcpdump, netstat, ss, and kernel tuning, complete with scripts, best‑practice tables and FAQ.

Ops Community
Ops Community
Ops Community
Why Are Your TCP Packets Dropping? A 3‑Day Capture Walkthrough to Kernel Parameter Fixes

Scenario & Preconditions

Targeted at advanced operations engineers who encounter intermittent TCP packet loss, sudden latency spikes, or occasional timeouts where ping remains normal but HTTP/SQL/RPC requests fail 1‑5% of the time. The guide assumes root access on RHEL/CentOS 7.9+ or Ubuntu 16.04 LTS+ with kernel 3.10+ (4.15+ recommended) and availability of tools such as tcpdump, wireshark, netstat, ss, and iperf3.

Anti‑Pattern Warnings

Application‑layer bugs – only specific apps time out, ping and iperf3 are fine; usually caused by too‑short timeout settings (<500 ms). Verify logs before blaming the network.

Cross‑cloud or cross‑region links – long paths with many devices make single‑point diagnosis impossible; use mtr/traceroute.

NAT/Proxy drops – connection queues or session timeouts on load balancers cause immediate disconnects.

No root on production – fallback to logs or a test replica.

Blind kernel tweaks – may solve one symptom but introduce latency or memory issues; change one parameter at a time and observe for 30 minutes.

Alternative Solutions Comparison

Existing network monitoring system – query NetFlow/sFlow (≈100× faster, full‑topology view).

Cloud environments (AWS/GCP) – use VPC Flow Logs (no packet capture, seconds‑level query).

Only application logs available – infer loss by counting timeouts (zero network tool cost).

Multi‑ISP paths – employ mtr for pinpointing loss points (more effective than tcpdump).

Environment & Version Matrix

OS : RHEL 7.9+/CentOS 8.5+ or Ubuntu 16.04/20.04/22.04 LTS (tested).

Kernel : 3.10.0‑1160+ / 4.18.0‑305+ (RHEL) or 4.15.0+/5.10.0+/5.15.0+ (Ubuntu).

Tools : tcpdump 4.9.3+, wireshark 2.6.x+/3.0.x+, netstat/ss (net‑tools 2.0+ or iproute2 4.9+), iperf3 3.6+.

Resources : Minimum 2C 4G 50GB, recommended 4C 8G 100GB, 1 Gbps NIC with RX/TX offload.

Quick Checklist (5‑minute pre‑flight)

Confirm symptoms are reproducible (stable timeout rate).

Obtain root (tcpdump needs CAP_NET_ADMIN).

Prepare two hosts (sender + receiver).

Implementation Steps – Three Diagnostic Laws

Law 1: Sender‑Side Queue Overflow

Goal: Verify whether the sending TCP queue is saturated.

# View global TCP stats
netstat -s | grep -A 20 "Tcp:"

Key metrics:

segments retransmitted – retransmission count; < 0.1% is normal, > 1% indicates loss.

snd_buf – per‑connection send queue size (use ss -tnis to list large queues).

net.core.netdev_max_backlog – device queue depth.

Validate before and after changes with netstat -s and ss -tnis. Common errors include missing netstat (install net-tools) or queue overflow causing high latency.

Law 2: Network‑Link Diagnosis – Precise Packet Capture

Goal: Capture traffic to see RST, DUP, or retransmission flags.

# Install if missing
which tcpdump || yum install -y tcpdump   # RHEL/CentOS
apt install -y tcpdump                     # Ubuntu/Debian
# Capture 1000 packets with detailed output
tcpdump -i eth0 -n -vvv -c 1000 'tcp port 3306 and (tcp[tcpflags] & tcp-syn != 0 or tcp[tcpflags] & tcp-rst != 0)'
# Or write to file for offline analysis
tcpdump -i eth0 -w tcp_dump.pcap -B 32000 'tcp and (host 192.168.1.100 or host 192.168.1.101)'

Parameters: -i eth0 – interface. -n – no DNS lookup. -B 32000 – 32 MB buffer to avoid capture‑induced loss.

Filter expressions (SYN, ACK, RST, FIN) isolate relevant traffic.

Post‑capture analysis with Wireshark (TCP Stream Graph → RTT, Expert Info → Retransmission/Duplicate ACK) or command‑line greps for RST|DUP|retransmission.

Law 3: Receiver‑Side Buffer & Application Processing

Goal: Detect Recv‑Q buildup or kernel‑level drops.

# Show receive queue sizes
ss -tnis | grep -E "ESTAB|State"
# System‑level drop counters
netstat -s | grep -i "receive\|drop"
# View NIC drop stats
cat /proc/net/dev | grep eth0

Key indicators:

Recv‑Q > 0 – data waiting in kernel buffer; > 20% of buffer size suggests slow application.

/proc/net/dev RX_drop – NIC queue overflow.

rmem_default / rmem_max – increase via sysctl -w net.core.rmem_max=134217728 (128 MB) and persist in /etc/sysctl.conf.

Adjust NIC ring size with ethtool -G eth0 rx 2048 tx 1024 and verify with ethtool -g eth0. If driver lacks support, upgrade firmware.

Minimal TCP Stack Path

Application (accept/recv) → Kernel TCP/IP stack → iperf‑backlog → NIC driver (RX ring) → Physical link

Loss can occur at any stage; the three laws map directly to these checkpoints.

Common Loss Signals & Diagnostic Commands

High retransmission rate – netstat -s | grep retransmit Recv‑Q buildup – ss -tnis | grep "ESTAB" NIC RX_drop – cat /proc/net/dev RST packets – tcpdump … | grep RST Application‑only bugs – check logs.

Observability, Monitoring & Alerts

Linux Native Monitoring

watch -n 1 'netstat -s | grep -E "segments|dropped|retrans|reset"'

Prometheus Rules (YAML excerpt)

groups:
- name: tcp_packet_loss
  interval: 30s
  rules:
  - alert: TCPHighRetransmitRate
    expr: rate(node_tcp_segments_retransmitted_total[5m]) / rate(node_tcp_segments_total[5m]) > 0.01
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "TCP retransmission rate > 1%"
      description: "{{ $value | humanizePercentage }}"
  - alert: TCPReceiveQueueBacklog
    expr: node_sockstat_TCP_inuse / node_sockstat_TCP_alloc > 0.8
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "TCP receive queue backlog > 80%"

Performance Benchmarks

Use iperf3 for baseline and stress tests, optionally combined with stress to simulate CPU load, and monitor packet loss with the scripts below.

Top 10 TCP Loss Root Causes

Application buffer overflow – check Recv‑Q > 1 MB, increase rmem_max or optimize code.

NIC driver queue full – monitor /proc/net/dev, raise ring size via ethtool -G.

Firewall drops – inspect iptables -L -v DROP counters.

TCP backlog full – raise net.core.somaxconn.

SYN‑RECV overflow – enable SYN cookies ( sysctl -w net.ipv4.tcp_syncookies=1).

Link jitter – ping RTT variance, use mtr.

Retransmission timeout – raise net.ipv4.tcp_retries2.

MTU mismatch – verify ip link show on both ends.

Kernel bug – upgrade kernel version.

Application bug – fix code, check logs.

Best Practices

Establish weekly TCP loss baselines with netstat -s.

Monthly iperf3 capacity tests to locate bandwidth‑induced loss.

Prometheus alerts on retransmission > 0.5% for early warning.

Align NIC and switch MTU, enable flow control and offloads.

Document every kernel‑tuned parameter with rationale.

FAQ

Q1: TCP loss vs. application timeout?

TCP loss is a network‑layer issue visible via retransmissions, RSTs, or drops; application timeout is a client‑side setting (<500 ms) that may fire even when the network is fine. Diagnose with netstat -s and tcpdump for loss, logs for timeout.

Q2: Why does ping work but TCP fails?

ICMP is handled separately and often bypasses socket buffers; TCP suffers from send/receive queue limits and can drop packets even when ICMP is fine.

Q3: How large should rmem_max be?

Use the BDP formula: BDP = bandwidth(Mbps) × RTT(ms) / 8. Set rmem_max ≈ 2 × BDP (e.g., 1 Gbps, 100 ms → ~125 MB). A safe default is 64 MB for most workloads.

Q4: Does tcpdump affect performance?

CPU overhead 5‑15% depending on filter complexity; memory buffer -B 32000 consumes ~32 MB. Run during off‑peak or on a replica.

Q5: Will a kernel upgrade fix loss?

Newer kernels bring better congestion control (BBR) and driver support, but first tune parameters; upgrade only if issues persist.

Q6: When do ethtool queue changes take effect?

Immediately, but they are lost on interface restart; persist by adding ETHTOOL_OPTS="-G eth0 rx 2048 tx 1024" to the NIC’s ifcfg file or Netplan config.

Key Scripts

Full Diagnostic Suite

#!/bin/bash
set -e
REPORT=/tmp/tcp_diagnostic_$(date +%Y%m%d_%H%M%S).txt
{
  echo "========== TCP Diagnostic Report =========="
  echo "Time: $(date)"
  echo "Host: $(hostname)"
  echo "--- System metrics ---"
  netstat -s | grep -A 15 "Tcp:"
  cat /proc/net/dev | grep -E "eth|ens"
  cat /proc/sys/net/core/rmem_default
  cat /proc/sys/net/core/rmem_max
  echo "--- Active connections ---"
  ss -tnis | head -30
  echo "--- Recommendations ---"
  RETRANS=$(netstat -s | grep "segments retransmitted" | awk '{print $1}')
  TOTAL=$(netstat -s | grep "segments sended out" | awk '{print $1}')
  if [ $TOTAL -gt 0 ]; then
    RATIO=$(echo "scale=4; $RETRANS/$TOTAL" | bc)
    echo "Retransmission rate: $RATIO"
    (( $(echo "$RATIO > 0.01" | bc -l) )) && echo "⚠️ Retrans >1% – check network" || echo "✓ Retransmission normal"
  fi
  RECV_Q=$(ss -tnis | awk '$3 > 0 {count++} END {print count}')
  [ $RECV_Q -gt 0 ] && echo "⚠️ $RECV_Q connections with Recv‑Q buildup – increase rmem_max" || echo "✓ Recv‑Q normal"
  which ethtool &>/dev/null && {
    ethtool -i eth0 || ethtool -i ens0
    echo "NIC ring depth:"
    ethtool -g eth0 2>/dev/null || ethtool -g ens0 2>/dev/null
  } || echo "ethtool not installed"
} | tee $REPORT

echo "Report saved to $REPORT"

Auto‑Tuning Script

#!/bin/bash
echo "=== Auto TCP Tuning ==="
CPU=$(nproc)
MEM=$(free -b | awk '/^Mem:/ {print $2}')
RMEM=$((MEM/4))   # 25% of RAM
WMEM=$RMEM
TCP_RMEM="4096 87380 $RMEM"
TCP_WMEM="4096 65536 $WMEM"
SOMAX=$((CPU*1000))
cp /etc/sysctl.conf /etc/sysctl.conf.bak.$(date +%s)
{
  echo "# Auto‑tuned parameters $(date)"
  echo "net.core.rmem_default = $RMEM"
  echo "net.core.rmem_max = $RMEM"
  echo "net.core.wmem_default = $WMEM"
  echo "net.core.wmem_max = $WMEM"
  echo "net.ipv4.tcp_rmem = $TCP_RMEM"
  echo "net.ipv4.tcp_wmem = $TCP_WMEM"
  echo "net.core.somaxconn = $SOMAX"
  echo "net.ipv4.tcp_max_syn_backlog = $SOMAX"
} >> /etc/sysctl.conf
sysctl -p

echo "✓ Tuning applied. Backup at /etc/sysctl.conf.bak.*"

References & Further Reading

Linux TCP man page – https://man7.org/linux/man-pages/man7/tcp.7.html

Kernel TCP docs – https://www.kernel.org/doc/html/latest/networking/

tcpdump manual – https://www.tcpdump.org/

Brendan Gregg’s performance analysis – https://www.brendangregg.com/linuxperf.html

Alibaba TCP optimization whitepaper – https://www.alibabacloud.com/

Wireshark – https://www.wireshark.org/

iperf3 – https://github.com/esnet/iperf

Prometheus – https://prometheus.io/

TCPLinuxSysctltcpdumpKernel ParametersPacket LossNetwork Diagnostics
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.