10 Real‑World TCPDump Cases That Reveal Hidden Network Issues
This guide walks you through ten authentic production‑level network problems, showing how to capture traffic with TCPDump, interpret packet data, pinpoint root causes such as firewall rules, window scaling, RST packets, DNS glitches, SSL handshake failures, and then apply concrete remediation steps.
Introduction
TCPDump records every packet on the wire, allowing you to trace problems from the physical layer up to the application layer.
Case 1 – Connection timeout (three‑way handshake failure)
Problem
# curl: (7) Failed to connect to api.example.com port 443: Connection timed outCapture
tcpdump -i eth0 -nn -s0 -w timeout.pcap host api.example.comAnalysis
Only SYN packets are observed; no SYN‑ACK replies, indicating the handshake never completes.
Solution
Inspect firewall rules – the outbound 443 port had been blocked, causing a “send‑only” situation that only packet capture could reveal.
Case 2 – Slow MySQL queries (TCP window scaling issue)
Problem
Intermittent query latency spikes to 30 seconds while CPU and memory appear normal.
Capture
# Capture MySQL traffic
tcpdump -i any -nn -s0 port 3306 and host 192.168.1.50 -w mysql_slow.pcap
# Show window size changes
tcpdump -r mysql_slow.pcap -nn | grep "win"Analysis
# Normal
10:45:01 IP client.45678 > mysql.3306: win 65535
# Abnormal
10:45:02 IP client.45678 > mysql.3306: win 0
10:45:02 IP mysql.3306 > client.45678: win 32768 [window probe]The receiver window drops to zero, triggering zero‑window probes and halting transmission.
Root cause
The application processes the MySQL result set too slowly, filling the receive buffer.
Case 3 – Load‑balancer resets (RST packet tracking)
Problem
Backend connections are frequently reset; logs show many “Connection reset by peer”.
Capture
# Capture only RST packets
tcpdump -i eth0 -nn 'tcp[tcpflags] & tcp-rst != 0' -c 100Analysis
11:20:15 IP 10.0.1.100.80 > 192.168.1.200.12345: Flags [R.], seq 12345, ack 67890
11:20:16 IP 10.0.1.100.80 > 192.168.1.201.12346: Flags [R.], seq 54321, ack 98765Investigation steps
Identify which side sends the RST (client or server).
Determine when the RST occurs (during data transfer or after connection establishment).
Check sequence numbers to verify legitimacy.
Finding
The load‑balancer health‑check timeout was set too short, causing legitimate connections to be killed.
Case 4 – HTTP POST failures (application‑layer analysis)
Problem
POST requests succeed only ~70 % of the time; GET requests work fine.
Capture
# Capture HTTP traffic on port 8080
tcpdump -i eth0 -A -s0 port 8080 -w http_post.pcap
# Filter POST requests
tcpdump -r http_post.pcap -A | grep -i "post"Analysis
# Successful POST
POST /api/user HTTP/1.1
Content-Length: 256
Content-Type: application/json
{ "user_id": 123, "name": "test" }
# Failed POST
POST /api/user HTTP/1.1
Content-Length: 512
Content-Type: application/json
{ "user_id": 123, "name": "test" }Content‑Length does not match the actual payload size.
Root cause
Nginx reverse‑proxy limited the request body (client_max_body_size) and the warning was hidden by the log level.
Case 5 – DNS resolution intermittency
Problem
Service works for a few minutes after start, then DNS resolution fails until a restart.
Capture
# Capture DNS traffic
tcpdump -i eth0 -nn port 53 -w dns_issue.pcap
# Show responses
tcpdump -r dns_issue.pcap -nn | grep "A?"Analysis
# Normal query
12:30:01 IP 192.168.1.100.12345 > 8.8.8.8.53: 12345+ A? api.example.com
12:30:01 IP 8.8.8.8.53 > 192.168.1.100.12345: 12345 1/0/0 A 203.0.113.10
# Abnormal query (no reply)
12:35:01 IP 192.168.1.100.12346 > 8.8.8.8.53: 12346+ A? api.example.comThe DNS server had dual‑stack (IPv4/IPv6) configuration, but IPv6 routing was broken, causing intermittent failures.
Case 6 – SSL handshake failures
Problem
HTTPS connections intermittently return “ERR_SSL_PROTOCOL_ERROR”.
Capture
# Capture SSL handshake
tcpdump -i eth0 -nn -s0 port 443 and host web.example.com -w ssl_handshake.pcap
# View raw packets
tcpdump -r ssl_handshake.pcap -nn -xAnalysis
# Normal flow
Client Hello -> Server Hello -> Certificate -> Server Hello Done -> Client Key Exchange -> Change Cipher Spec -> Finished
# Abnormal flow (break at Certificate)
Client Hello -> Server Hello -> [connection reset]The certificate chain was incomplete; an intermediate CA certificate was missing, causing some clients to abort the handshake.
Case 7 – Microservice communication issues
Problem
Service A calls Service B with 95 % success; the remaining 5 % fail without a clear pattern.
Capture
# Capture traffic between the two services
tcpdump -i any -nn -s0 '(src host serviceA and dst host serviceB) or (src host serviceB and dst host serviceA)' -w microservice.pcap
# Summarize connections
tcpdump -r microservice.pcap -nn | awk '{print $3 " -> " $5}' | sort | uniq -cFindings
# Normal connection
192.168.1.10.8080 -> 192.168.1.20.9090: established
# Abnormal connection – port reuse issue
192.168.1.10.8080 -> 192.168.1.20.9090: [port reused, sequence numbers scrambled]Service B’s restart left sockets in TIME_WAIT; subsequent connections reused the same port, causing sequence‑number collisions.
Case 8 – Bandwidth saturation
Problem
Server bandwidth spikes to 95 % without a corresponding traffic increase.
Capture
# List top‑talking connections
tcpdump -i eth0 -nn -q | head -1000 | awk '{print $3}' | sort | uniq -c | sort -nr
# Capture heavy‑traffic flow details
tcpdump -i eth0 -nn -s0 src 192.168.1.100 -w bandwidth_hog.pcapDiscovery
Most traffic originates from an internal IP repeatedly sending health‑check probes; the load‑balancer health‑check interval was set to 1 ms.
Optimization
Adjust the health‑check interval; bandwidth usage returns to normal instantly.
Case 9 – Packet corruption
Problem
Clients send correct data, but the server receives truncated or garbled packets.
Capture
# Capture on the gateway between client and server
tcpdump -i eth0 -xx -s0 host client.ip and host server.ip -w packet_corruption.pcap
# Compare payloads
tcpdump -r packet_corruption.pcap -xx | grep "payload"Result
# Client (correct)
45 00 05 dc ... [full packet]
# Server (corrupted)
45 00 05 dc ... [packet altered by intermediate device]A switch firmware bug recalculated checksums incorrectly for a specific packet pattern.
Case 10 – API latency analysis
Problem
API P99 latency reaches 5 seconds while the underlying DB query takes only 50 ms.
Capture
# Precise timestamp capture
tcpdump -i eth0 -ttt -nn port 8080 -w latency_analysis.pcap
# Extract HTTP timestamps
tcpdump -r latency_analysis.pcap -ttt -nn | grep "HTTP"Decomposition
# TCP connection time: 150 ms
# SSL handshake time: 300 ms
# HTTP processing: 50 ms
# Network transmission: 4500 ms ← problem areaRoot cause
The outbound gateway’s QoS policy mistakenly marked API traffic as low priority, causing excessive queuing.
TCPDump practical tips
Universal capture command
tcpdump -i any -nn -s0 -w capture.pcapParameters:
-i any – listen on all interfaces
-nn – do not resolve hostnames or ports
-s0 – capture the full packet
-w file – write to file
Filter examples
# Host filter
tcpdump host 192.168.1.100
# Port filter
tcpdump port 80 or port 443
# Protocol filter
tcpdump tcp and not ssh
# SYN flag filter
tcpdump 'tcp[tcpflags] & tcp-syn != 0'
# Combined filter example
tcpdump -i eth0 -nn 'host 192.168.1.100 and (port 80 or port 443) and tcp[tcpflags] & tcp-syn != 0'Analysis framework
Four‑step troubleshooting:
Phenomenon description – record symptoms and collect logs.
Hypothesis verification – design capture tests based on assumptions.
Data analysis – examine packet captures for abnormal patterns.
Root‑cause confirmation – identify the underlying issue and implement a fix.
Five analysis dimensions:
Connection layer – handshake and teardown.
Transport layer – sequence numbers, ACKs, window changes.
Application layer – HTTP status, SSL handshake.
Time layer – latency distribution, timeout settings.
Statistics layer – retransmission rate, packet loss, connection count.
Automation script
#!/bin/bash
# Quick network diagnosis script
echo "Starting network diagnosis..."
# Basic connectivity test
ping -c 4 $1 > /tmp/ping.log 2>&1
# Capture short trace (30 s max)
timeout 30 tcpdump -i any -nn -c 100 host $1 -w /tmp/capture.pcap 2>/dev/null
# Show results
echo "=== Connectivity Test ==="
cat /tmp/ping.log
echo "=== Packet Statistics ==="
tcpdump -r /tmp/capture.pcap -nn | head -10Performance‑monitoring integration
# Example Zabbix trigger script
if [ $NETWORK_ERROR_RATE -gt 5 ]; then
tcpdump -i eth0 -G 300 -W 2 -w /var/log/auto_capture_%Y%m%d_%H%M%S.pcap &
echo "Automatic capture started"
fiLarge‑scale environment tips
Use circular buffers to avoid disk exhaustion.
Set precise filters to reduce CPU load.
Capture on a network‑mirroring port to avoid impacting production.
Establish archiving and cleanup policies for capture files.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
