Operations 58 min read

Master Linux Network Troubleshooting with tcpdump, ss, and iptables

A comprehensive guide for ops engineers that explains how to use tcpdump, ss, and iptables to diagnose and resolve common Linux networking issues, covering tool basics, practical scenarios, detailed command examples, scripts, best practices, and monitoring strategies.

Ops Community
Ops Community
Ops Community
Master Linux Network Troubleshooting with tcpdump, ss, and iptables

Overview

This guide shows how mastering three Linux networking tools— tcpdump, ss, and iptables —covers the majority of production network incidents. By following a systematic workflow you can quickly identify whether a problem lies in the physical layer, the transport layer, or the firewall.

Tool Characteristics

tcpdump : Kernel‑level packet capture using libpcap. Zero external dependencies, works on virtually every distribution, and adds less than 3 % CPU load on 10 Gbps NICs.

ss : Replaces netstat by reading the kernel netlink interface. It is 10‑30× faster on high‑connection servers and provides detailed socket statistics.

iptables : Classic Linux firewall with four tables (raw, mangle, nat, filter) and five built‑in chains (PREROUTING, INPUT, FORWARD, OUTPUT, POSTROUTING). Still the default firewall on many production systems even though nftables is the future.

Typical Scenarios

Interface timeout – use tcpdump to capture the three‑way handshake and measure latency.

Connection‑count explosion – use ss to list socket states (ESTABLISHED, TIME‑WAIT, CLOSE‑WAIT).

Service unreachable after deployment – verify firewall rules with iptables.

Environment Requirements

OS: CentOS 7+, Ubuntu 18.04+, Debian 10+ (kernel ≥ 4.9 for full ss filtering). tcpdump ≥ 4.9 ( tcpdump --version). iproute2 (provides ss) ≥ 4.15 ( ss --version). iptables ≥ 1.6 (CentOS 8+ uses nftables underneath but the compatibility layer works).

Hardware: ≥2 GB RAM, ≥10 GB free disk (large captures can fill disk quickly).

Step‑by‑Step Procedure

Preparation

# Verify kernel version
uname -r
# Show OS release
cat /etc/os-release
# List network interfaces
ip addr show
# Check available disk space for captures
df -h /tmp
# Show memory usage
free -h

Install Dependencies

# CentOS / RHEL
sudo yum install -y tcpdump iproute iptables-services net-tools bind-utils mtr nc
# Ubuntu / Debian
sudo apt update
sudo apt install -y tcpdump iproute2 iptables dnsutils mtr-tiny netcat-openbsd
# Verify installation
tcpdump --version
ss --version
iptables --version

Common Pitfall (CentOS 8+)

CentOS 8+ uses firewalld (nftables backend). Mixing iptables commands with firewalld can cause rule conflicts. The safest approach is to disable firewalld and enable the iptables‑services package:

# Stop and disable firewalld
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo systemctl mask firewalld
# Install and start iptables service
sudo yum install -y iptables-services
sudo systemctl enable iptables
sudo systemctl start iptables

Core Configuration

tcpdump Practical Usage

Basic syntax: tcpdump [options] [filter]. Frequently used options: -i eth0 – select interface (required for every capture). -n – no hostname resolution (faster). -nn – no hostname or port name resolution (most precise). -c 100 – capture only 100 packets (prevents disk exhaustion). -w file.pcap – write raw capture to a file for later analysis. -A – print ASCII payload (useful for HTTP). -X – hex + ASCII view. -tttt – full timestamp (needed for latency measurement).

Common filters:

# Capture traffic to a specific host
tcpdump -i eth0 -nn host 10.0.1.100
# Capture only SYN packets (new connections)
 tcpdump -i eth0 -nn 'tcp[tcpflags] & tcp-syn != 0'
# Capture HTTP GET requests
 tcpdump -i eth0 -nn -A -s 0 'tcp dst port 80 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420'
# Capture DNS queries
 tcpdump -i eth0 -nn port 53

ss Detailed Usage

Key commands:

# List all TCP sockets
ss -tn
# Show listening sockets with process info
ss -tlnp
# Summary statistics
ss -s
# Filter by state
ss -tn state established
ss -tn state time-wait
ss -tn state close-wait
# Filter by port or address
ss -tn 'dport = :80'
ss -tn 'src 10.0.1.0/24'
# Show internal TCP details (RTT, cwnd, etc.)
ss -ti dst 10.0.2.100

Important options: -t – TCP only. -u – UDP only. -l – listening sockets. -n – numeric output. -p – show process (requires root). -e – extended info (UID, inode). -i – internal TCP info (RTT, congestion window).

iptables Practical Management

Typical workflow:

# Flush existing rules
iptables -F
iptables -X
iptables -Z
iptables -t nat -F
iptables -t nat -X
# Set default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
# Allow loopback and established connections
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Example: allow SSH (port 22) with rate limiting
iptables -A INPUT -p tcp --dport 22 -m state --state NEW -m recent --set --name SSH
iptables -A INPUT -p tcp --dport 22 -m state --state NEW -m recent --update --seconds 60 --hitcount 10 --name SSH -j DROP
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
# Example: allow HTTP/HTTPS
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
# Drop everything else
iptables -A INPUT -j REJECT --reject-with icmp-host-prohibited

Key notes:

Place the ESTABLISHED,RELATED rule at the top of the INPUT chain.

Use ipset for large black‑/white‑lists to keep rule count low.

Always backup rules before changes and schedule an automatic rollback (e.g., at now + 10 minutes "iptables-restore < /tmp/backup.rules").

Real‑World Cases

Case 1 – MySQL Connection Timeout

Symptoms: 5 % of requests to a MySQL‑backed service exceed 3 s; application logs only show “connection timeout”.

# Capture traffic between app server (10.0.3.20) and MySQL (10.0.2.100:3306)
 tcpdump -i eth0 -nn -tttt -w /tmp/mysql_debug.pcap host 10.0.2.100 and port 3306 -c 10000
# Analyse SYN‑ACK latency
 tcpdump -nn -r /tmp/mysql_debug.pcap 'tcp[tcpflags] & tcp-syn != 0' | head -50
# Count SYN retransmissions
 tcpdump -nn -r /tmp/mysql_debug.pcap 'tcp[tcpflags] & tcp-syn != 0' | wc -l
# Check listen queue on MySQL host
 ss -tlnp | grep 3306
# Observe Recv‑Q > Send‑Q (e.g., 129 > 128) – queue overflow

Resolution: increase kernel and application backlog.

# Increase system‑wide backlog
sysctl -w net.core.somaxconn=65535
echo 'net.core.somaxconn = 65535' >> /etc/sysctl.conf
sysctl -w net.ipv4.tcp_max_syn_backlog=65535
echo 'net.ipv4.tcp_max_syn_backlog = 65535' >> /etc/sysctl.conf
sysctl -p
# Adjust application listen backlog (e.g., Nginx, Tomcat, Go)
# Nginx:  listen 3306 backlog=65535;
# Tomcat: server.tomcat.accept-count=65535

Case 2 – Connection‑Count Surge

Symptoms: a web server’s TCP connections jump from ~2 000 to 60 000 within minutes.

# Quick state summary
 ss -s
# Identify TIME‑WAIT distribution by destination port
 ss -tn state time-wait | awk '{print $4}' | cut -d: -f2 | sort | uniq -c | sort -rn | head -5
# Identify top remote IPs causing the surge
 ss -tn state time-wait | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -5
# Temporary kernel tweak to recycle TIME‑WAIT faster
 sysctl -w net.ipv4.tcp_tw_reuse=1
 sysctl -w net.ipv4.tcp_max_tw_buckets=20000
echo 'net.ipv4.tcp_tw_reuse = 1' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_max_tw_buckets = 20000' >> /etc/sysctl.conf
sysctl -p
# Fix application bug (connection pool missing) and redeploy

Best Practices & Caveats

tcpdump Capture Guidelines

Always limit packet count ( -c) or duration (e.g., timeout 300 tcpdump …) to avoid disk exhaustion.

Capture only the needed host/port to reduce CPU load.

Store files in /tmp or a dedicated capture directory, not on the root or data partition.

Delete captures promptly after analysis.

iptables Rule Optimization

Place high‑frequency matches ( ESTABLISHED,RELATED) at the top of the INPUT chain.

Consolidate large IP blocks with ipset:

# Create a set
ipset create blacklist hash:ip maxelem 100000
# Add IPs
ipset add blacklist 1.2.3.4
# Single rule to drop the set
iptables -I INPUT 2 -m set --match-set blacklist src -j DROP

Annotate rules with comments ( -m comment --comment "…") for future maintenance.

Never change the default policy to DROP before adding the necessary ACCEPT rules; always add ACCEPT rules first, then set the policy.

Firewall Change Safety

Backup current rules:

iptables-save > /tmp/iptables-backup-$(date +%Y%m%d_%H%M%S).rules

.

Schedule an automatic rollback (e.g.,

at now + 10 minutes "iptables-restore < /tmp/iptables-backup.rules"

).

Keep two SSH sessions open when editing remotely.

Test changes locally before applying globally.

Performance Monitoring

Key Metrics

TCP ESTABLISHED connections (normal < 5 000, warning > 10 000).

TCP TIME‑WAIT (normal < 5 000, warning > 20 000).

CLOSE‑WAIT (should stay near zero; growth indicates application leak).

conntrack usage (keep below 70 %; alert at 85 %).

Interface drop counters (any non‑zero value is a red flag).

TCP retransmission rate (normal < 0.1 %; alert > 1 %).

Listen‑queue overflow (should be zero).

Network bandwidth utilization (warn at 80 %).

Prometheus Alert Rules (excerpt)

groups:
- name: network_alerts
  rules:
  - alert: TcpConnectionsHigh
    expr: node_netstat_Tcp_CurrEstab > 10000
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High TCP ESTABLISHED connections on {{ $labels.instance }}"
      description: "Current count {{ $value }} exceeds 10 000."
  - alert: TcpTimeWaitHigh
    expr: node_sockstat_TCP_tw > 20000
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Excessive TIME_WAIT sockets on {{ $labels.instance }}"
  - alert: ConntrackTableNearFull
    expr: (node_nf_conntrack_entries / node_nf_conntrack_entries_limit) * 100 > 85
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "Conntrack table near capacity on {{ $labels.instance }}"
      description: "Usage {{ $value | printf "%.1f" }}%"

Backup & Restore

Automated iptables Backup (daily)

#!/bin/bash
BACKUP_DIR="/opt/backup/iptables"
KEEP_DAYS=30
DATE=$(date +%Y%m%d_%H%M%S)
HOSTNAME=$(hostname)
mkdir -p "$BACKUP_DIR"
iptables-save > "${BACKUP_DIR}/${HOSTNAME}_iptables_${DATE}.rules"
# Save sysctl network settings
sysctl -a 2>/dev/null | grep -E '^net\.' > "${BACKUP_DIR}/${HOSTNAME}_sysctl_${DATE}.conf"
# Backup ipset if present
if command -v ipset >/dev/null 2>&1; then
  ipset save > "${BACKUP_DIR}/${HOSTNAME}_ipset_${DATE}.rules"
fi
# Remove old backups
find "$BACKUP_DIR" -name "*.rules" -mtime +$KEEP_DAYS -delete
find "$BACKUP_DIR" -name "*.conf" -mtime +$KEEP_DAYS -delete

echo "[${DATE}] iptables backup completed: ${BACKUP_DIR}/${HOSTNAME}_iptables_${DATE}.rules"

Restore Procedure

Inspect the backup file to ensure it contains the expected ACCEPT rules (especially SSH on port 22).

Restore with iptables-restore < /path/to/backup.rules.

Verify the rule set: iptables -L -n --line-numbers.

Test remote SSH and application ports from an external host.

Persist the restored rules (CentOS: service iptables save; Ubuntu: netfilter-persistent save).

Conclusion

Mastering tcpdump, ss, and iptables gives operations engineers a powerful three‑tool arsenal that covers the majority of Linux networking incidents. By following the systematic workflow, applying best‑practice rule ordering, tuning kernel parameters, and automating backups and monitoring, teams can diagnose issues quickly, avoid accidental lock‑outs, and maintain high‑availability services.

Next steps for deeper expertise include learning eBPF/BCC tools (e.g., tcplife, tcpretrans), migrating to nftables, and using Wireshark for detailed packet‑level analysis.

NetworkOpsiptablestcpdumpss
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.