Master Linux Network Troubleshooting with tcpdump, ss, and iptables
A comprehensive guide for ops engineers that explains how to use tcpdump, ss, and iptables to diagnose and resolve common Linux networking issues, covering tool basics, practical scenarios, detailed command examples, scripts, best practices, and monitoring strategies.
Overview
This guide shows how mastering three Linux networking tools— tcpdump, ss, and iptables —covers the majority of production network incidents. By following a systematic workflow you can quickly identify whether a problem lies in the physical layer, the transport layer, or the firewall.
Tool Characteristics
tcpdump : Kernel‑level packet capture using libpcap. Zero external dependencies, works on virtually every distribution, and adds less than 3 % CPU load on 10 Gbps NICs.
ss : Replaces netstat by reading the kernel netlink interface. It is 10‑30× faster on high‑connection servers and provides detailed socket statistics.
iptables : Classic Linux firewall with four tables (raw, mangle, nat, filter) and five built‑in chains (PREROUTING, INPUT, FORWARD, OUTPUT, POSTROUTING). Still the default firewall on many production systems even though nftables is the future.
Typical Scenarios
Interface timeout – use tcpdump to capture the three‑way handshake and measure latency.
Connection‑count explosion – use ss to list socket states (ESTABLISHED, TIME‑WAIT, CLOSE‑WAIT).
Service unreachable after deployment – verify firewall rules with iptables.
Environment Requirements
OS: CentOS 7+, Ubuntu 18.04+, Debian 10+ (kernel ≥ 4.9 for full ss filtering). tcpdump ≥ 4.9 ( tcpdump --version). iproute2 (provides ss) ≥ 4.15 ( ss --version). iptables ≥ 1.6 (CentOS 8+ uses nftables underneath but the compatibility layer works).
Hardware: ≥2 GB RAM, ≥10 GB free disk (large captures can fill disk quickly).
Step‑by‑Step Procedure
Preparation
# Verify kernel version
uname -r
# Show OS release
cat /etc/os-release
# List network interfaces
ip addr show
# Check available disk space for captures
df -h /tmp
# Show memory usage
free -hInstall Dependencies
# CentOS / RHEL
sudo yum install -y tcpdump iproute iptables-services net-tools bind-utils mtr nc
# Ubuntu / Debian
sudo apt update
sudo apt install -y tcpdump iproute2 iptables dnsutils mtr-tiny netcat-openbsd
# Verify installation
tcpdump --version
ss --version
iptables --versionCommon Pitfall (CentOS 8+)
CentOS 8+ uses firewalld (nftables backend). Mixing iptables commands with firewalld can cause rule conflicts. The safest approach is to disable firewalld and enable the iptables‑services package:
# Stop and disable firewalld
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo systemctl mask firewalld
# Install and start iptables service
sudo yum install -y iptables-services
sudo systemctl enable iptables
sudo systemctl start iptablesCore Configuration
tcpdump Practical Usage
Basic syntax: tcpdump [options] [filter]. Frequently used options: -i eth0 – select interface (required for every capture). -n – no hostname resolution (faster). -nn – no hostname or port name resolution (most precise). -c 100 – capture only 100 packets (prevents disk exhaustion). -w file.pcap – write raw capture to a file for later analysis. -A – print ASCII payload (useful for HTTP). -X – hex + ASCII view. -tttt – full timestamp (needed for latency measurement).
Common filters:
# Capture traffic to a specific host
tcpdump -i eth0 -nn host 10.0.1.100
# Capture only SYN packets (new connections)
tcpdump -i eth0 -nn 'tcp[tcpflags] & tcp-syn != 0'
# Capture HTTP GET requests
tcpdump -i eth0 -nn -A -s 0 'tcp dst port 80 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420'
# Capture DNS queries
tcpdump -i eth0 -nn port 53ss Detailed Usage
Key commands:
# List all TCP sockets
ss -tn
# Show listening sockets with process info
ss -tlnp
# Summary statistics
ss -s
# Filter by state
ss -tn state established
ss -tn state time-wait
ss -tn state close-wait
# Filter by port or address
ss -tn 'dport = :80'
ss -tn 'src 10.0.1.0/24'
# Show internal TCP details (RTT, cwnd, etc.)
ss -ti dst 10.0.2.100Important options: -t – TCP only. -u – UDP only. -l – listening sockets. -n – numeric output. -p – show process (requires root). -e – extended info (UID, inode). -i – internal TCP info (RTT, congestion window).
iptables Practical Management
Typical workflow:
# Flush existing rules
iptables -F
iptables -X
iptables -Z
iptables -t nat -F
iptables -t nat -X
# Set default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
# Allow loopback and established connections
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Example: allow SSH (port 22) with rate limiting
iptables -A INPUT -p tcp --dport 22 -m state --state NEW -m recent --set --name SSH
iptables -A INPUT -p tcp --dport 22 -m state --state NEW -m recent --update --seconds 60 --hitcount 10 --name SSH -j DROP
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
# Example: allow HTTP/HTTPS
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
# Drop everything else
iptables -A INPUT -j REJECT --reject-with icmp-host-prohibitedKey notes:
Place the ESTABLISHED,RELATED rule at the top of the INPUT chain.
Use ipset for large black‑/white‑lists to keep rule count low.
Always backup rules before changes and schedule an automatic rollback (e.g., at now + 10 minutes "iptables-restore < /tmp/backup.rules").
Real‑World Cases
Case 1 – MySQL Connection Timeout
Symptoms: 5 % of requests to a MySQL‑backed service exceed 3 s; application logs only show “connection timeout”.
# Capture traffic between app server (10.0.3.20) and MySQL (10.0.2.100:3306)
tcpdump -i eth0 -nn -tttt -w /tmp/mysql_debug.pcap host 10.0.2.100 and port 3306 -c 10000
# Analyse SYN‑ACK latency
tcpdump -nn -r /tmp/mysql_debug.pcap 'tcp[tcpflags] & tcp-syn != 0' | head -50
# Count SYN retransmissions
tcpdump -nn -r /tmp/mysql_debug.pcap 'tcp[tcpflags] & tcp-syn != 0' | wc -l
# Check listen queue on MySQL host
ss -tlnp | grep 3306
# Observe Recv‑Q > Send‑Q (e.g., 129 > 128) – queue overflowResolution: increase kernel and application backlog.
# Increase system‑wide backlog
sysctl -w net.core.somaxconn=65535
echo 'net.core.somaxconn = 65535' >> /etc/sysctl.conf
sysctl -w net.ipv4.tcp_max_syn_backlog=65535
echo 'net.ipv4.tcp_max_syn_backlog = 65535' >> /etc/sysctl.conf
sysctl -p
# Adjust application listen backlog (e.g., Nginx, Tomcat, Go)
# Nginx: listen 3306 backlog=65535;
# Tomcat: server.tomcat.accept-count=65535Case 2 – Connection‑Count Surge
Symptoms: a web server’s TCP connections jump from ~2 000 to 60 000 within minutes.
# Quick state summary
ss -s
# Identify TIME‑WAIT distribution by destination port
ss -tn state time-wait | awk '{print $4}' | cut -d: -f2 | sort | uniq -c | sort -rn | head -5
# Identify top remote IPs causing the surge
ss -tn state time-wait | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -5
# Temporary kernel tweak to recycle TIME‑WAIT faster
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_max_tw_buckets=20000
echo 'net.ipv4.tcp_tw_reuse = 1' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_max_tw_buckets = 20000' >> /etc/sysctl.conf
sysctl -p
# Fix application bug (connection pool missing) and redeployBest Practices & Caveats
tcpdump Capture Guidelines
Always limit packet count ( -c) or duration (e.g., timeout 300 tcpdump …) to avoid disk exhaustion.
Capture only the needed host/port to reduce CPU load.
Store files in /tmp or a dedicated capture directory, not on the root or data partition.
Delete captures promptly after analysis.
iptables Rule Optimization
Place high‑frequency matches ( ESTABLISHED,RELATED) at the top of the INPUT chain.
Consolidate large IP blocks with ipset:
# Create a set
ipset create blacklist hash:ip maxelem 100000
# Add IPs
ipset add blacklist 1.2.3.4
# Single rule to drop the set
iptables -I INPUT 2 -m set --match-set blacklist src -j DROPAnnotate rules with comments ( -m comment --comment "…") for future maintenance.
Never change the default policy to DROP before adding the necessary ACCEPT rules; always add ACCEPT rules first, then set the policy.
Firewall Change Safety
Backup current rules:
iptables-save > /tmp/iptables-backup-$(date +%Y%m%d_%H%M%S).rules.
Schedule an automatic rollback (e.g.,
at now + 10 minutes "iptables-restore < /tmp/iptables-backup.rules").
Keep two SSH sessions open when editing remotely.
Test changes locally before applying globally.
Performance Monitoring
Key Metrics
TCP ESTABLISHED connections (normal < 5 000, warning > 10 000).
TCP TIME‑WAIT (normal < 5 000, warning > 20 000).
CLOSE‑WAIT (should stay near zero; growth indicates application leak).
conntrack usage (keep below 70 %; alert at 85 %).
Interface drop counters (any non‑zero value is a red flag).
TCP retransmission rate (normal < 0.1 %; alert > 1 %).
Listen‑queue overflow (should be zero).
Network bandwidth utilization (warn at 80 %).
Prometheus Alert Rules (excerpt)
groups:
- name: network_alerts
rules:
- alert: TcpConnectionsHigh
expr: node_netstat_Tcp_CurrEstab > 10000
for: 5m
labels:
severity: warning
annotations:
summary: "High TCP ESTABLISHED connections on {{ $labels.instance }}"
description: "Current count {{ $value }} exceeds 10 000."
- alert: TcpTimeWaitHigh
expr: node_sockstat_TCP_tw > 20000
for: 5m
labels:
severity: warning
annotations:
summary: "Excessive TIME_WAIT sockets on {{ $labels.instance }}"
- alert: ConntrackTableNearFull
expr: (node_nf_conntrack_entries / node_nf_conntrack_entries_limit) * 100 > 85
for: 2m
labels:
severity: critical
annotations:
summary: "Conntrack table near capacity on {{ $labels.instance }}"
description: "Usage {{ $value | printf "%.1f" }}%"Backup & Restore
Automated iptables Backup (daily)
#!/bin/bash
BACKUP_DIR="/opt/backup/iptables"
KEEP_DAYS=30
DATE=$(date +%Y%m%d_%H%M%S)
HOSTNAME=$(hostname)
mkdir -p "$BACKUP_DIR"
iptables-save > "${BACKUP_DIR}/${HOSTNAME}_iptables_${DATE}.rules"
# Save sysctl network settings
sysctl -a 2>/dev/null | grep -E '^net\.' > "${BACKUP_DIR}/${HOSTNAME}_sysctl_${DATE}.conf"
# Backup ipset if present
if command -v ipset >/dev/null 2>&1; then
ipset save > "${BACKUP_DIR}/${HOSTNAME}_ipset_${DATE}.rules"
fi
# Remove old backups
find "$BACKUP_DIR" -name "*.rules" -mtime +$KEEP_DAYS -delete
find "$BACKUP_DIR" -name "*.conf" -mtime +$KEEP_DAYS -delete
echo "[${DATE}] iptables backup completed: ${BACKUP_DIR}/${HOSTNAME}_iptables_${DATE}.rules"Restore Procedure
Inspect the backup file to ensure it contains the expected ACCEPT rules (especially SSH on port 22).
Restore with iptables-restore < /path/to/backup.rules.
Verify the rule set: iptables -L -n --line-numbers.
Test remote SSH and application ports from an external host.
Persist the restored rules (CentOS: service iptables save; Ubuntu: netfilter-persistent save).
Conclusion
Mastering tcpdump, ss, and iptables gives operations engineers a powerful three‑tool arsenal that covers the majority of Linux networking incidents. By following the systematic workflow, applying best‑practice rule ordering, tuning kernel parameters, and automating backups and monitoring, teams can diagnose issues quickly, avoid accidental lock‑outs, and maintain high‑availability services.
Next steps for deeper expertise include learning eBPF/BCC tools (e.g., tcplife, tcpretrans), migrating to nftables, and using Wireshark for detailed packet‑level analysis.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
