Operations 34 min read

Mastering Network Packet Loss: Diagnosis and Solutions for Linux Servers

This guide explains the fundamentals of network packet loss, illustrates how packets are sent and received, and provides step‑by‑step troubleshooting methods for hardware NIC, driver, kernel stack, TCP/UDP, and application‑level issues on Linux systems, complete with command examples and visual diagrams.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering Network Packet Loss: Diagnosis and Solutions for Linux Servers

Introduction

This article shares a common network issue—packet loss. When a ping to a server succeeds and returns full data, the communication is healthy; if the ping fails or the response is incomplete, packets are likely being dropped. The following sections present typical packet‑loss diagnosis methods.

What Is Packet Loss?

Data on the Internet is transmitted in packets (bytes). Network devices, link quality, and other factors can cause the received data to be smaller than the sent data, resulting in packet loss.

Packet Transmission Principles

Sending Packets:

Application data is encapsulated with a TCP header at the TCP layer, forming a transmittable packet.

An IP header is added at the IP layer, forming an IP packet.

The NIC driver adds a 14‑byte MAC header, creating a frame that contains source and destination MAC addresses.

The driver copies the frame into the NIC buffer for the NIC to process.

The NIC adds synchronization information and a CRC, encapsulating the frame into a packet that is transmitted on the wire.

Receiving Packets:

The NIC receives the packet, checks CRC, and discards frames with invalid CRC. It also verifies that the destination MAC matches the NIC’s MAC; otherwise the frame is dropped.

The NIC copies the frame into its ring buffer.

The NIC driver notifies the kernel, which processes the packet through the TCP/IP stack.

The application reads the data from the socket buffer.

Core Idea

Understanding the send/receive process reveals that packet loss mainly originates from three layers: NIC hardware, NIC driver, and the kernel protocol stack. The troubleshooting approach follows a bottom‑up layer analysis, checking key information at each layer to pinpoint the cause.

Packet‑Loss Scenarios Overview

Hardware NIC loss

NIC driver loss

Ethernet link‑layer loss

Network IP‑layer loss

Transport‑layer UDP/TCP loss

Application‑layer socket loss

Hardware NIC Loss

Ring Buffer Overflow

When incoming packets arrive faster than the kernel can consume them, the NIC’s ring buffer fills and new packets are dropped.

Check:

$ ethtool -S eth0 | grep rx_fifo</code><code>$ cat /proc/net/dev

To view the NIC’s ring buffer size: $ ethtool -g eth0 Solution: Increase the NIC’s receive and transmit hardware buffers. $ ethtool -G eth0 rx 4096 tx 4096 Port Negotiation Loss

Check NIC statistics and configuration: $ ethtool -S eth1</code><code>$ ethtool eth1 Solution: Re‑negotiate the link or force a specific speed.

$ ethtool -r eth1</code><code>$ ethtool -s eth1 speed 1000 duplex full autoneg off

Flow‑Control Loss

Check flow‑control counters: $ ethtool -S eth1 | grep control Solution: Disable NIC flow control.

$ ethtool -A ethx autoneg off</code><code>$ ethtool -A ethx tx off</code><code>$ ethtool -A ethx rx off

MAC‑Address Mismatch Loss

If the NIC operates in non‑promiscuous mode, it only accepts frames addressed to its own MAC. A stale ARP entry or changed NIC can cause drops.

Check with tcpdump in promiscuous mode or examine ARP tables on both ends.

Solution: Refresh the ARP table or set a correct static ARP entry.

Other NIC Anomalies

Check NIC firmware version for bugs: $ ethtool -i eth1 Check cable integrity if CRC errors increase: $ ethtool -S eth0 Solution: Reseat or replace the cable.

Packet Length Loss

Ethernet frames must be 64‑1518 bytes. Oversized or undersized frames may be dropped. $ ethtool -S eth1 | grep length_errors Solution: Adjust MTU or enable jumbo frames, and ensure proper segmentation on the sender.

NIC Driver Loss

Check driver statistics with ifconfig eth1 or ethtool -S eth1.

RX Errors : Total receive errors, including FIFO overruns and CRC failures.

RX Dropped : Packets entered the ring buffer but were dropped due to insufficient memory.

RX Overruns : Kernel couldn’t keep up with NIC interrupts, causing packet loss.

RX Frame Errors : Misaligned frames or other hardware issues.

Driver Queue Overflow

Linux uses netdev_max_backlog as a per‑CPU backlog queue. When the queue exceeds the limit, packets are dropped. $ cat /proc/net/softnet_stat Solution: Increase net.core.netdev_max_backlog. $ sysctl -w net.core.netdev_max_backlog=2000 Single‑CPU High Load

High soft‑interrupt usage on one CPU can starve packet processing. $ mpstat -P ALL 1 Solution: Balance IRQs across CPUs, adjust RSS queues, or enable IRQ affinity.

$ ethtool -x ethx</code><code>$ ethtool -X ethx 8

Also consider disabling interrupt coalescing if it adds latency:

$ ethtool -C ethx adaptive-rx on

Kernel Protocol‑Stack Loss

Ethernet Link‑Layer Loss

ARP Ignoring

Configure arp_ignore to control which ARP requests are answered. $ sysctl -a | grep arp_ignore Solution: Set the appropriate value for the environment.

ARP Filter

In multi‑NIC setups, arp_filter prevents the wrong NIC from answering ARP requests. $ sysctl -a | grep arp_filter Solution: Enable or disable based on topology.

ARP Table Overflow

$ cat /proc/net/stat/arp_cache</code><code>$ dmesg | grep neighbour

Solution: Increase ARP cache limits.

$ sysctl -w net.ipv4.neigh.default.gc_thresh1=1024</code><code>$ sysctl -w net.ipv4.neigh.default.gc_thresh2=2048</code><code>$ sysctl -w net.ipv4.neigh.default.gc_thresh3=4096

Network IP‑Layer Loss

Interface IP Misconfiguration

Verify that the loopback and other interfaces have correct IPs. $ ip a add 1.1.1.1 dev eth0 Routing Loss

Check routing tables and policy routing.

$ ip r get 8.8.8.8</code><code>$ netstat -s | grep "dropped because of missing route"

Solution: Correct routing entries.

Reverse‑Path Filtering $ cat /proc/sys/net/ipv4/conf/eth0/rp_filter Set to 0 or 2 depending on the environment. $ sysctl -w net.ipv4.conf.all.rp_filter=2 Firewall Drop $ iptables -nvL | grep DROP Solution: Adjust firewall rules.

Connection‑Tracking Table Overflow $ cat /proc/sys/net/netfilter/nf_conntrack_max Increase limits or reduce timeout values.

$ sysctl -w net.netfilter.nf_conntrack_max=3276800</code><code>$ sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=1200

Transport‑Layer UDP/TCP Loss

TCP Connection‑Tracking Security Check $ sysctl -a | grep nf_conntrack_tcp_be_liberal Toggle the setting if it causes drops.

Fragment Reassembly Loss

$ netstat -s | grep "fragments dropped after timeout"

Adjust net.ipv4.ipfrag_time and related thresholds. $ sysctl -w net.ipv4.ipfrag_time=60 TCP Congestion Control

BBR may cause latency spikes in deep‑queue scenarios; consider switching to Cubic or tuning BBR parameters.

$ sysctl -w net.ipv4.tcp_congestion_control=cubic

UDP‑Layer Loss

Check UDP buffer statistics and increase net.ipv4.udp_mem, udp_rmem_min, udp_wmem_min as needed.

$ sysctl -w net.ipv4.udp_mem="65536 131072 262144"

Remember that UDP is unreliable; design applications accordingly or add reliability at the application layer.

Application‑Layer Socket Loss

Inspect socket receive errors: $ netstat -s | grep "packet receive errors" Adjust socket buffer sizes:

$ sysctl -w net.core.rmem_default=31457280</code><code>$ sysctl -w net.core.rmem_max=67108864

For send‑buffer errors, increase net.core.wmem_default and net.core.wmem_max.

$ sysctl -w net.core.wmem_default=31457280</code><code>$ sysctl -w net.core.wmem_max=33554432

Calculate appropriate buffer sizes using the bandwidth‑delay product (BDP = bandwidth × RTT).

Related Tools

dropwatch : Monitors kernel drop events and prints the call stack where packets are discarded.

tcpdump : Captures network traffic for detailed analysis.

Use wireshark or tshark for GUI or command‑line packet inspection.

Conclusion

This article covers most common packet‑loss points and provides specific diagnosis steps and solutions for each layer. While modern cloud networks involve complex underlay and overlay topologies, mastering these fundamentals enables systematic, layer‑by‑layer troubleshooting to quickly locate and resolve packet‑loss incidents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

networktroubleshootingethtoolPacket Loss
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.