Operations 18 min read

10 Real‑World TCPDump Cases That Uncover Hidden Network Problems

This article walks senior operations engineers through ten authentic production‑level TCPDump case studies, teaching core command options, packet‑analysis heuristics, and a systematic four‑step troubleshooting framework that turns network mysteries into clear, actionable solutions.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
10 Real‑World TCPDump Cases That Uncover Hidden Network Problems

TCPDump Practical Analysis: 10 Common Network Issues

As a seasoned operations engineer, I have encountered countless odd network problems in production. Here are ten real cases that demonstrate how TCPDump can expose every hidden issue.

Why TCPDump is an Operations Essential

When alerts wake you at 3 am, users complain of latency, or developers blame the network, TCPDump is the reliable companion that records every packet truthfully.

What you will learn:

10 real‑world troubleshooting cases

Key TCPDump parameters and usage tips

Golden rules for packet‑analysis

Practical methods for rapid problem location

Case 1: Mysterious Connection Timeout – TCP Handshake Failure

Problem

# User feedback: API request frequently times out
curl: (7) Failed to connect to api.example.com port 443: Connection timed out

Capture

# Capture command
tcpdump -i eth0 -nn -s0 -w timeout.pcap host api.example.com

# Analysis result
10:30:15.123456 IP 192.168.1.100.45678 > 203.0.113.10.443: Flags [S], seq 1000, win 65535
10:30:18.123456 IP 192.168.1.100.45678 > 203.0.113.10.443: Flags [S], seq 1000, win 65535
10:30:24.123456 IP 192.168.1.100.45678 > 203.0.113.10.443: Flags [S], seq 1000, win 65535

Key finding: Only SYN packets are present, no SYN‑ACK replies.

Solution

Inspect firewall rules – outbound port 443 was mistakenly blocked, causing a one‑way connection failure.

Ops tip: 80% of network issues hide in firewalls; TCPDump reveals them instantly.

Case 2: Strange Slow Query – TCP Window Scaling Issue

Problem

Intermittent slow database queries; CPU and memory look normal, but response time spikes to 30 seconds.

Capture

# Capture MySQL traffic
tcpdump -i any -nn -s0 port 3306 and host 192.168.1.50 -w mysql_slow.pcap

# Focus on window size changes
tcpdump -r mysql_slow.pcap -nn | grep "win"

Result

# Normal
10:45:01 IP client.45678 > mysql.3306: win 65535

# Abnormal
10:45:02 IP client.45678 > mysql.3306: win 0
10:45:02 IP mysql.3306 > client.45678: win 32768 [window probe]

Finding: Receiver window drops to zero, triggering zero‑window probes and halting transmission.

Root cause: Application processes MySQL result sets too slowly, exhausting the receive buffer.

Case 3: Load‑Balancer Killer – RST Packet Tracing

Problem

Backend connections are frequently reset after the load balancer distributes traffic.

Capture

# Capture RST packets only
tcpdump -i eth0 -nn 'tcp[tcpflags] & tcp-rst != 0' -c 100

Analysis

Identify which side initiates the RST.

Determine the timing of the RST (during data transfer or after handshake).

Check sequence numbers for abnormal resets.

Final discovery: Health‑check timeout on the load balancer was too short, causing legitimate connections to be killed.

Case 4: Vanishing HTTP POST Requests – Application‑Layer Analysis

Problem

POST success rate is only 70 % while GET works fine; developers claim the code is correct.

Deep Capture

# Capture HTTP traffic
tcpdump -i eth0 -A -s0 port 8080 -w http_post.pcap
# Filter POST requests
tcpdump -r http_post.pcap -A | grep -i "post"

Findings

# Successful POST
POST /api/user HTTP/1.1
Content-Length: 256
Content-Type: application/json
{ "user_id": 123, "name": "test" }

# Failed POST
POST /api/user HTTP/1.1
Content-Length: 512
Content-Type: application/json
{ "user_id": 123, "name": "test" }

Root cause: Nginx client_max_body_size limit caused mismatched Content‑Length, and the high log level hid the error.

Case 5: DNS Silent Failure

Problem

Application works for a few minutes after start, then domain resolution fails until a restart.

DNS Capture

# Capture DNS queries
tcpdump -i eth0 -nn port 53 -w dns_issue.pcap
# Analyze responses
tcpdump -r dns_issue.pcap -nn | grep "A?"

Key discovery

# Normal query
12:30:01 IP 192.168.1.100.12345 > 8.8.8.8.53: 12345+ A? api.example.com
12:30:01 IP 8.8.8.8.53 > 192.168.1.100.12345: 12345 1/0/0 A 203.0.113.10

# Abnormal query – no reply
12:35:01 IP 192.168.1.100.12346 > 8.8.8.8.53: 12346+ A? api.example.com

Root cause: Dual‑stack DNS server had broken IPv6 routing; the client randomly chose the IPv6 server, leading to intermittent failures.

Case 6: SSL Handshake Dark Moment

Problem

HTTPS service intermittently reports "ERR_SSL_PROTOCOL_ERROR".

TLS Capture

# Capture SSL handshake
tcpdump -i eth0 -nn -s0 port 443 and host web.example.com -w ssl_handshake.pcap
# View raw packets
tcpdump -r ssl_handshake.pcap -nn -x

Analysis

# Normal flow
Client Hello -> Server Hello -> Certificate -> Server Hello Done -> Client Key Exchange -> Change Cipher Spec -> Finished

# Failure at Certificate stage
Client Hello -> Server Hello -> [connection dropped]

Deep insight: The certificate chain was incomplete; missing intermediate CA caused some clients to abort the handshake.

Case 7: Microservice Communication Pitfall

Problem

Service A calls Service B with 95 % success; the remaining 5 % failures show no pattern.

Distributed Capture

# Capture both directions
tcpdump -i any -nn -s0 '(src host serviceA and dst host serviceB) or (src host serviceB and dst host serviceA)' -w microservice.pcap
# Group by connection
tcpdump -r microservice.pcap -nn | awk '{print $3 " -> " $5}' | sort | uniq -c

Finding

# Normal connection
192.168.1.10.8080 -> 192.168.1.20.9090: established

# Abnormal – port reuse issue
192.168.1.10.8080 -> 192.168.1.20.9090: [port reused, sequence numbers mixed]

Root cause: Service B restarted, leaving sockets in TIME_WAIT; aggressive port reuse caused sequence‑number collisions.

Case 8: Bandwidth Saturation Mystery

Alert

Server bandwidth spikes to 95 % without traffic growth.

Traffic Analysis

# List top connections by traffic volume
tcpdump -i eth0 -nn -q | head -1000 | awk '{print $3}' | sort | uniq -c | sort -nr
# Capture heavy‑traffic flow
tcpdump -i eth0 -nn -s0 src 192.168.1.100 -w bandwidth_hog.pcap

Discovery: Repeated health‑check requests from an internal IP were configured with a 1 ms interval, consuming most of the bandwidth.

Optimization: Adjust health‑check interval; bandwidth returns to normal instantly.

Case 9: Packet Mutation Mystery

Problem

Client sends correct data, but server receives truncated or garbled packets.

Mid‑point Capture

# Capture at gateway
tcpdump -i eth0 -xx -s0 host client.ip and host server.ip -w packet_corruption.pcap
# Compare payloads
tcpdump -r packet_corruption.pcap -xx | grep "payload"

Result

# Client (correct)
45 00 05 dc ... [full packet]

# Server (corrupted)
45 00 05 dc ... [modified by network device]

Root cause: A switch firmware bug recalculated checksums incorrectly for a specific packet pattern.

Case 10: Latency Bottleneck – P99 API Delay

Performance Issue

API P99 latency reaches 5 s while DB query takes only 50 ms.

Precise Timestamp Capture

# Capture with nanosecond timestamps
tcpdump -i eth0 -ttt -nn port 8080 -w latency_analysis.pcap
# Extract round‑trip times
tcpdump -r latency_analysis.pcap -ttt -nn | grep "HTTP"

Delay Breakdown

# TCP connection: 150 ms
# SSL handshake: 300 ms
# HTTP processing: 50 ms
# Network transmission: 4500 ms ← problem area!

Final root cause: Outbound gateway QoS mistakenly marked API traffic as low priority, inflating network transmission time.

TCPDump Master Techniques

Golden Parameter Set

# Universal capture command
tcpdump -i any -nn -s0 -w capture.pcap
# Parameter meanings
-i any   # listen on all interfaces
-nn      # do not resolve hostnames or ports
-s0      # capture full packet
-w file  # write to file

Filter Magic

# Host filter
tcpdump host 192.168.1.100
# Port filter
tcpdump port 80 or port 443
# Protocol filter
tcpdump tcp and not ssh
# Flag filter
tcpdump 'tcp[tcpflags] & tcp-syn != 0'
# Combined example
tcpdump -i eth0 -nn 'host 192.168.1.100 and (port 80 or port 443) and tcp[tcpflags] & tcp-syn != 0'

Analysis Framework – Four Steps

Phenomenon Description – Accurately describe the issue and collect logs.

Hypothesis Verification – Propose hypotheses based on experience and design capture tests.

Data Analysis – Dive into packet data to spot abnormal patterns.

Root‑Cause Confirmation – Identify the true cause and devise a fix.

Five Analysis Dimensions

Connection Layer – Handshake and teardown status.

Transport Layer – Sequence numbers, ACKs, window changes.

Application Layer – HTTP codes, SSL handshake details.

Timing Layer – Latency distribution, timeout settings.

Statistics Layer – Retransmission rate, packet loss, connection count.

Mastering these techniques turns every packet into a clue, enabling rapid, data‑driven network troubleshooting.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Operationsnetwork troubleshootingLinuxpacket analysistcpdump
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.