Don’t Claim You Can Troubleshoot Networks Until You Understand Packet Loss
This article explains what network packet loss is, its common causes—from hardware faults to congestion and misconfiguration—and provides a step‑by‑step, production‑ready methodology for diagnosing and resolving loss using tools such as ping, traceroute, Wireshark and tcpdump.
1. Understanding Network Packet Loss
Network latency, time‑outs, and intermittent disconnections are usually caused by packet loss, where data packets fail to reach their destination. The article defines packet loss with an analogy to lost parcels in a delivery chain and describes the Linux receive path: physical NIC → DMA into a ring buffer → kernel reads the buffer, processes IP/TCP/UDP layers, and finally delivers to the socket buffer.
Linux Receive Flow
Network frames arrive at the NIC.
The NIC driver copies frames into a ring buffer using DMA, without CPU involvement.
The kernel reads the ring buffer, creates an skb structure, and processes the packet through the protocol stack.
The processed packet is placed in the application’s socket buffer.
Linux Send Flow
When an application calls sendmsg, the kernel allocates an skb, fills TCP/UDP headers, performs routing, applies netfilter rules, and finally hands the packet to the NIC’s transmit ring buffer. The NIC uses DMA to send the frame, generates a hardware interrupt on completion, and a soft‑interrupt cleans up the ring buffer.
2. Common Causes of Packet Loss
2.1 Hardware Issues
Faulty NICs, aging cables, or overheated routers/switches can drop packets. Examples include a friend’s old PC with a failing NIC and an office where a broken cable caused loss after renovation. Detection tools: device manager, cable testers, Cisco Network Assistant.
2.2 Network Congestion
When traffic exceeds bandwidth, buffers overflow and packets are discarded. Scenarios: multiple users downloading large files, video conferences, or multiplayer games sharing limited uplink capacity. Monitoring tools: iftop, SolarWinds, bandwidth utilization >80% indicates congestion.
2.3 Software & Configuration
Misconfigured firewalls, mismatched MTU values, or wrong IP/subnet settings can cause loss. Example: a firewall rule blocking external sites, or MTU 1492 on PPPoE vs. default 1500 causing fragmentation loss.
2.4 External Factors
Wireless interference from microwaves, Bluetooth, or other APs, and DDoS attacks that flood the network. Detection: Wi‑Fi Analyzer for signal interference; IDS/IPS for attack traffic.
3. Detecting Packet Loss
3.1 Common Tools
ping : Sends ICMP Echo Requests, measures round‑trip time and loss rate.
traceroute / tracert : Traces the path using TTL, identifies the hop where loss begins.
Wireshark : Captures and decodes packets across all layers.
tcpdump : Lightweight command‑line capture for Linux/Unix.
3.2 Example Commands
Pinging www.a.shifen.com [14.215.177.39] with 32 bytes of data:
Reply from 14.215.177.39: bytes=32 time=30ms TTL=51 Tracing route to www.a.shifen.com [14.215.177.39]
1 < 1 ms < 1 ms < 1 ms 192.168.1.1
2 10 ms 9 ms 8 ms 10.1.1.13.3 Capture Preparation
Select the correct network interface (e.g., eth0 for wired, wlan0 for wireless). Ensure root/administrator privileges ( sudo tcpdump -i eth0 on Linux, run Wireshark as admin on Windows).
3.4 Capture Procedure
In Wireshark, choose the interface, start capture, and apply filters such as http, ip.addr == 192.168.1.100, or tcp.port == 80 and ip.addr == 192.168.1.100 to focus on relevant traffic. Stop capture to analyze timestamps, source/destination, protocol, and packet length.
4. Locating the Root Cause
4.1 Hardware Inspection
Check NIC status with ip addr and ethtool eth0, ping the loopback address, and replace the NIC if needed. Verify cable integrity visually and with a tester. Examine router/switch temperature and logs; high CPU (94%) and memory (96%) usage indicate overload.
4.2 Network Environment
Assess congestion using sar -n DEV 1 or ifstat. Wireless interference can be measured with iwconfig and iwlist wlan0 scan. DDoS detection via IDS logs and abnormal traffic spikes.
4.3 Software & Configuration
Review firewall rules ( firewall-cmd --list-all) and logs ( journalctl -u firewalld). Verify IP configuration with ipconfig /all (Windows) or ip addr (Linux). Check TCP retransmission timeout ( sysctl net.ipv4.tcp_retries2); a value of 1 is too low and increases loss.
5. Real‑World Case Study
A company experiences severe external‑network loss: ping to 8.8.8.8 shows 55‑65% loss, traceroute stops at the core router, and the router’s CPU/memory are near full capacity. Physical inspection reveals normal cables but an overheated fan (65 °C). Bandwidth utilization is 100% with video traffic dominating, and the 2.4 GHz Wi‑Fi band is congested. After replacing the fan, upgrading the router, and moving critical services to a 5 GHz band, packet loss drops to <1% and latency stabilizes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
