Why Your Linux Server Drops Packets and How to Debug It Step‑by‑Step
This article explains why Linux servers can experience mysterious packet loss, walks through a systematic debugging process from the link layer to the kernel TCP/IP stack, and provides practical solutions such as adjusting kernel parameters and increasing listen queue sizes to eliminate the issue.
1. Linux Network Basics
Linux implements the network protocol stack with the link layer handled by NIC drivers, the network layer by the kernel, and provides socket interfaces for applications. The link layer converts data from the network layer into physical signals, while the network layer routes packets using IP, and the transport layer uses TCP or UDP for reliable or fast delivery.
1.1 What is the network protocol stack?
The link layer is the lowest layer, responsible for converting network layer data into frames with source and destination MAC addresses. ARP resolves IP addresses to MAC addresses. The network layer handles routing and fragmentation, while the transport layer provides TCP (reliable) and UDP (fast) services.
1.2 Interrupt Mechanism
Linux uses hardware (hard) interrupts for immediate NIC events and soft interrupts (softirq) for deferred processing, allowing the CPU to handle other tasks while the NIC processes packets.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
// ... (simulation code for hard and soft interrupts)2. Where Packet Loss Can Occur
Packet loss may happen at any layer of the protocol stack: link layer (NIC buffers), network layer (routing, IP fragmentation), transport layer (TCP congestion windows), or within kernel processing (softirq handling, queue overflows).
2.1 Receive Path
When a NIC receives a packet, DMA copies it to a ring buffer, a hard interrupt notifies the CPU, and a soft interrupt processes the packet through the network stack.
#include <stdio.h>
// ... (code simulating NIC receive and softirq processing)2.2 Kernel Network Stack Processing
The softirq handler calls net_rx_action, which invokes the driver’s poll function to convert raw data into skb structures, optionally merges packets with GRO, and enqueues them for further processing.
#include <stdio.h>
// ... (code for net_rx_action and packet handling)2.3 Transport Layer
After the network layer, packets reach the transport layer where TCP or UDP headers are processed, and sockets are matched.
#include <stdio.h>
// ... (code for UDP receive handling)3. Packet Sending Path
3.1 From Application to Kernel
Applications use socket APIs to send data, which the kernel copies into a sk_buff and places in the socket’s send buffer.
#include <iostream>
#include <sys/socket.h>
// ... (C++ client example)3.2 Kernel Protocol Stack Processing
The kernel adds TCP/UDP, IP, and Ethernet headers to the packet as it moves down the stack.
#include <stdio.h>
// ... (code adding TCP, IP, and Ethernet headers)3.3 From Kernel to NIC
The NIC driver uses DMA to write the packet to the NIC’s buffer, triggers the hardware to transmit the packet, and notifies the kernel via an interrupt when transmission completes.
#include <stdio.h>
// ... (code simulating NIC transmission and interrupt handling)4. Real‑World Debugging Case: Linux Kernel Packet Loss
4.1 Project Background
A CentOS 7.9 server on Alibaba Cloud with a 1 Gbps NIC experienced intermittent packet loss causing payment timeouts. Monitoring showed high drop rates despite low CPU, memory, and bandwidth usage.
4.2 Investigation Steps
1. Checked link layer with ethtool – driver was stable, rx_dropped increased, indicating kernel‑level drops. 2. Verified routing and firewall – no anomalies. 3. Captured traffic with tcpdump – observed TCP retransmissions and missing ACKs. 4. Used ss and bpftrace to trace kfree_skb and found many TCP_LISTEN_DROP events, meaning the listen backlog was full.
4.3 Solution
Increased listen queue and TCP backlog parameters:
sysctl -w net.core.somaxconn=1024
sysctl -w net.ipv4.tcp_max_syn_backlog=2048
sysctl -w net.ipv4.tcp_tw_reuse=1Made the changes permanent in /etc/sysctl.conf. After applying, packet loss dropped to near zero, payment timeouts disappeared, and monitoring confirmed stable performance.
5. Key Takeaways
Diagnose systematically from link layer upward before changing kernel parameters.
Use tracing tools like bpftrace to pinpoint where packets are dropped in the kernel.
Adjust kernel listen and backlog settings to handle peak traffic and prevent queue overflows.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
