Operations 18 min read

10 Real‑World TCPDump Cases That Reveal Hidden Network Issues

This guide walks you through ten authentic production‑level network problems, showing how to capture traffic with TCPDump, interpret packet data, pinpoint root causes such as firewall rules, window scaling, RST packets, DNS glitches, SSL handshake failures, and then apply concrete remediation steps.

Raymond Ops
Raymond Ops
Raymond Ops
10 Real‑World TCPDump Cases That Reveal Hidden Network Issues

Introduction

TCPDump records every packet on the wire, allowing you to trace problems from the physical layer up to the application layer.

Case 1 – Connection timeout (three‑way handshake failure)

Problem

# curl: (7) Failed to connect to api.example.com port 443: Connection timed out

Capture

tcpdump -i eth0 -nn -s0 -w timeout.pcap host api.example.com

Analysis

Only SYN packets are observed; no SYN‑ACK replies, indicating the handshake never completes.

Solution

Inspect firewall rules – the outbound 443 port had been blocked, causing a “send‑only” situation that only packet capture could reveal.

Case 2 – Slow MySQL queries (TCP window scaling issue)

Problem

Intermittent query latency spikes to 30 seconds while CPU and memory appear normal.

Capture

# Capture MySQL traffic
tcpdump -i any -nn -s0 port 3306 and host 192.168.1.50 -w mysql_slow.pcap
# Show window size changes
tcpdump -r mysql_slow.pcap -nn | grep "win"

Analysis

# Normal
10:45:01 IP client.45678 > mysql.3306: win 65535
# Abnormal
10:45:02 IP client.45678 > mysql.3306: win 0
10:45:02 IP mysql.3306 > client.45678: win 32768 [window probe]

The receiver window drops to zero, triggering zero‑window probes and halting transmission.

Root cause

The application processes the MySQL result set too slowly, filling the receive buffer.

Case 3 – Load‑balancer resets (RST packet tracking)

Problem

Backend connections are frequently reset; logs show many “Connection reset by peer”.

Capture

# Capture only RST packets
tcpdump -i eth0 -nn 'tcp[tcpflags] & tcp-rst != 0' -c 100

Analysis

11:20:15 IP 10.0.1.100.80 > 192.168.1.200.12345: Flags [R.], seq 12345, ack 67890
11:20:16 IP 10.0.1.100.80 > 192.168.1.201.12346: Flags [R.], seq 54321, ack 98765

Investigation steps

Identify which side sends the RST (client or server).

Determine when the RST occurs (during data transfer or after connection establishment).

Check sequence numbers to verify legitimacy.

Finding

The load‑balancer health‑check timeout was set too short, causing legitimate connections to be killed.

Case 4 – HTTP POST failures (application‑layer analysis)

Problem

POST requests succeed only ~70 % of the time; GET requests work fine.

Capture

# Capture HTTP traffic on port 8080
tcpdump -i eth0 -A -s0 port 8080 -w http_post.pcap
# Filter POST requests
tcpdump -r http_post.pcap -A | grep -i "post"

Analysis

# Successful POST
POST /api/user HTTP/1.1
Content-Length: 256
Content-Type: application/json
{ "user_id": 123, "name": "test" }

# Failed POST
POST /api/user HTTP/1.1
Content-Length: 512
Content-Type: application/json
{ "user_id": 123, "name": "test" }

Content‑Length does not match the actual payload size.

Root cause

Nginx reverse‑proxy limited the request body (client_max_body_size) and the warning was hidden by the log level.

Case 5 – DNS resolution intermittency

Problem

Service works for a few minutes after start, then DNS resolution fails until a restart.

Capture

# Capture DNS traffic
tcpdump -i eth0 -nn port 53 -w dns_issue.pcap
# Show responses
tcpdump -r dns_issue.pcap -nn | grep "A?"

Analysis

# Normal query
12:30:01 IP 192.168.1.100.12345 > 8.8.8.8.53: 12345+ A? api.example.com
12:30:01 IP 8.8.8.8.53 > 192.168.1.100.12345: 12345 1/0/0 A 203.0.113.10

# Abnormal query (no reply)
12:35:01 IP 192.168.1.100.12346 > 8.8.8.8.53: 12346+ A? api.example.com

The DNS server had dual‑stack (IPv4/IPv6) configuration, but IPv6 routing was broken, causing intermittent failures.

Case 6 – SSL handshake failures

Problem

HTTPS connections intermittently return “ERR_SSL_PROTOCOL_ERROR”.

Capture

# Capture SSL handshake
tcpdump -i eth0 -nn -s0 port 443 and host web.example.com -w ssl_handshake.pcap
# View raw packets
tcpdump -r ssl_handshake.pcap -nn -x

Analysis

# Normal flow
Client Hello -> Server Hello -> Certificate -> Server Hello Done -> Client Key Exchange -> Change Cipher Spec -> Finished

# Abnormal flow (break at Certificate)
Client Hello -> Server Hello -> [connection reset]

The certificate chain was incomplete; an intermediate CA certificate was missing, causing some clients to abort the handshake.

Case 7 – Microservice communication issues

Problem

Service A calls Service B with 95 % success; the remaining 5 % fail without a clear pattern.

Capture

# Capture traffic between the two services
tcpdump -i any -nn -s0 '(src host serviceA and dst host serviceB) or (src host serviceB and dst host serviceA)' -w microservice.pcap
# Summarize connections
tcpdump -r microservice.pcap -nn | awk '{print $3 " -> " $5}' | sort | uniq -c

Findings

# Normal connection
192.168.1.10.8080 -> 192.168.1.20.9090: established

# Abnormal connection – port reuse issue
192.168.1.10.8080 -> 192.168.1.20.9090: [port reused, sequence numbers scrambled]

Service B’s restart left sockets in TIME_WAIT; subsequent connections reused the same port, causing sequence‑number collisions.

Case 8 – Bandwidth saturation

Problem

Server bandwidth spikes to 95 % without a corresponding traffic increase.

Capture

# List top‑talking connections
tcpdump -i eth0 -nn -q | head -1000 | awk '{print $3}' | sort | uniq -c | sort -nr
# Capture heavy‑traffic flow details
tcpdump -i eth0 -nn -s0 src 192.168.1.100 -w bandwidth_hog.pcap

Discovery

Most traffic originates from an internal IP repeatedly sending health‑check probes; the load‑balancer health‑check interval was set to 1 ms.

Optimization

Adjust the health‑check interval; bandwidth usage returns to normal instantly.

Case 9 – Packet corruption

Problem

Clients send correct data, but the server receives truncated or garbled packets.

Capture

# Capture on the gateway between client and server
tcpdump -i eth0 -xx -s0 host client.ip and host server.ip -w packet_corruption.pcap
# Compare payloads
tcpdump -r packet_corruption.pcap -xx | grep "payload"

Result

# Client (correct)
45 00 05 dc ... [full packet]

# Server (corrupted)
45 00 05 dc ... [packet altered by intermediate device]

A switch firmware bug recalculated checksums incorrectly for a specific packet pattern.

Case 10 – API latency analysis

Problem

API P99 latency reaches 5 seconds while the underlying DB query takes only 50 ms.

Capture

# Precise timestamp capture
tcpdump -i eth0 -ttt -nn port 8080 -w latency_analysis.pcap
# Extract HTTP timestamps
tcpdump -r latency_analysis.pcap -ttt -nn | grep "HTTP"

Decomposition

# TCP connection time: 150 ms
# SSL handshake time: 300 ms
# HTTP processing: 50 ms
# Network transmission: 4500 ms ← problem area

Root cause

The outbound gateway’s QoS policy mistakenly marked API traffic as low priority, causing excessive queuing.

TCPDump practical tips

Universal capture command

tcpdump -i any -nn -s0 -w capture.pcap

Parameters:

-i any – listen on all interfaces

-nn – do not resolve hostnames or ports

-s0 – capture the full packet

-w file – write to file

Filter examples

# Host filter
tcpdump host 192.168.1.100

# Port filter
tcpdump port 80 or port 443

# Protocol filter
tcpdump tcp and not ssh

# SYN flag filter
tcpdump 'tcp[tcpflags] & tcp-syn != 0'

# Combined filter example
tcpdump -i eth0 -nn 'host 192.168.1.100 and (port 80 or port 443) and tcp[tcpflags] & tcp-syn != 0'

Analysis framework

Four‑step troubleshooting:

Phenomenon description – record symptoms and collect logs.

Hypothesis verification – design capture tests based on assumptions.

Data analysis – examine packet captures for abnormal patterns.

Root‑cause confirmation – identify the underlying issue and implement a fix.

Five analysis dimensions:

Connection layer – handshake and teardown.

Transport layer – sequence numbers, ACKs, window changes.

Application layer – HTTP status, SSL handshake.

Time layer – latency distribution, timeout settings.

Statistics layer – retransmission rate, packet loss, connection count.

Automation script

#!/bin/bash
# Quick network diagnosis script

echo "Starting network diagnosis..."

# Basic connectivity test
ping -c 4 $1 > /tmp/ping.log 2>&1

# Capture short trace (30 s max)
timeout 30 tcpdump -i any -nn -c 100 host $1 -w /tmp/capture.pcap 2>/dev/null

# Show results
echo "=== Connectivity Test ==="
cat /tmp/ping.log

echo "=== Packet Statistics ==="
tcpdump -r /tmp/capture.pcap -nn | head -10

Performance‑monitoring integration

# Example Zabbix trigger script
if [ $NETWORK_ERROR_RATE -gt 5 ]; then
    tcpdump -i eth0 -G 300 -W 2 -w /var/log/auto_capture_%Y%m%d_%H%M%S.pcap &
    echo "Automatic capture started"
fi

Large‑scale environment tips

Use circular buffers to avoid disk exhaustion.

Set precise filters to reduce CPU load.

Capture on a network‑mirroring port to avoid impacting production.

Establish archiving and cleanup policies for capture files.

operationsNetwork TroubleshootingPacket CaptureCase Studiestcpdump
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.