Operations 30 min read

Why High‑Throughput Redis Still Drops Packets: Deep Dive into Linux Network Stack and Interrupt Optimization

The article investigates massive packet loss in Meituan‑Dianping's Redis service despite 10 Gbps NIC upgrades, traces the issue to kernel receive‑buffer drops and single‑CPU interrupt handling, and presents a step‑by‑step optimization using backlog tuning, CPU and Redis affinity, and NUMA‑aware placement to eliminate drops and improve latency.

dbaplus Community
dbaplus Community
dbaplus Community
Why High‑Throughput Redis Still Drops Packets: Deep Dive into Linux Network Stack and Interrupt Optimization

Background

Since early 2017 Meituan‑Dianping’s Redis traffic grew from hundreds of billions of requests per day to trillions, causing severe stability challenges. The main symptom was persistent packet loss even after upgrading NICs from 1 Gbps to 10 Gbps.

Locating the loss

Monitoring net.if.in.dropped showed large drop counters. This metric is read from /proc/net/dev and populated by the kernel functions dev_seq_show() and dev_seq_printf_stats(). The relevant fields are rx_dropped and rx_missed_errors in struct rtnl_link_stats64. rx_dropped counts packets dropped because the kernel receive buffers are full; rx_missed_errors varies by driver (e.g., Intel igb adds RQDPC, ixgbe does not).

Packet‑receive path

When a NIC receives a frame it writes the data into an sk_buff allocated from the Rx ring buffer. The steps are:

NIC writes packet to a DMA‑mapped buffer.

Driver notifies the kernel via a hardware interrupt.

Interrupt handler (hard IRQ) schedules a soft IRQ ( NET_RX_SOFTIRQ).

Soft IRQ runs net_rx_action(), which polls the Rx queue (NAPI) up to net.core.netdev_budget packets.

If the per‑CPU backlog input_pkt_queue exceeds net.core.netdev_max_backlog, enqueue_to_backlog() drops the packet and increments rx_dropped.

Validation

SystemTap scripts captured hard‑IRQ and soft‑IRQ distribution. Both hard and soft interrupts were concentrated on CPU 0, confirming a single‑CPU bottleneck. cat /proc/softnet_stat showed a large second column (backlog drops) on the same CPU.

Optimization strategies

Increasing netdev_max_backlog

Temporarily raising the backlog limit reduced occasional drops but did not solve the high‑throughput case.

CPU interrupt affinity

Interrupt vectors were bound to the first eight physical cores using a command such as echo [MASK] > /proc/irq/[IRQ_ID]/smp_affinity. This spread the interrupt load but introduced higher Redis slow‑query rates.

Redis process affinity

Redis workers were pinned to the remaining cores with taskset -cp. The separation lowered slow‑queries but did not eliminate them.

NUMA‑aware placement

Both interrupt handling and Redis were confined to the same NUMA node. This reduced cross‑node memory traffic, improved cache locality, and further decreased latency.

Result

After applying interrupt affinity, Redis affinity, and NUMA alignment, packet loss dropped to near‑zero under full 40 Gbps traffic, and Redis slow‑query counts fell dramatically, with only a small residual.

Key kernel code excerpts

static int dev_seq_show(struct seq_file *seq, void *v) {
    if (v == SEQ_START_TOKEN)
        seq_puts(seq, "Inter-|   Receive ... |  Transmit
");
    else
        dev_seq_printf_stats(seq, v);
    return 0;
}

static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev) {
    struct rtnl_link_stats64 temp;
    const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp);
    seq_printf(seq, "%6s: %7llu %7llu %4llu %4llu ...
",
               dev->name, stats->rx_bytes, stats->rx_packets,
               stats->rx_errors,
               stats->rx_dropped + stats->rx_missed_errors, ...);
}
struct rtnl_link_stats64 {
    __u64 rx_packets;   __u64 tx_packets;
    __u64 rx_bytes;     __u64 tx_bytes;
    __u64 rx_errors;    __u64 tx_errors;
    __u64 rx_dropped;   __u64 tx_dropped;
    __u64 multicast;    __u64 collisions;
    __u64 rx_length_errors; __u64 rx_over_errors;
    __u64 rx_crc_errors;    __u64 rx_frame_errors;
    __u64 rx_fifo_errors;   __u64 rx_missed_errors;
    __u64 tx_aborted_errors; __u64 tx_carrier_errors;
    __u64 tx_fifo_errors;   __u64 tx_heartbeat_errors;
    __u64 tx_window_errors;
    __u64 rx_compressed;    __u64 tx_compressed;
};
static int enqueue_to_backlog(struct sk_buff *skb, int cpu, unsigned int *qtail) {
    struct softnet_data *sd = &per_cpu(softnet_data, cpu);
    local_irq_save(flags);
    rps_lock(sd);
    if (skb_queue_len(&sd->input_pkt_queue) <= netdev_max_backlog) {
        if (skb_queue_len(&sd->input_pkt_queue))
            __skb_queue_tail(&sd->input_pkt_queue, skb);
        else
            ____napi_schedule(sd, &sd->backlog);
        rps_unlock(sd);
        local_irq_restore(flags);
        return NET_RX_SUCCESS;
    }
    sd->dropped++;
    rps_unlock(sd);
    local_irq_restore(flags);
    atomic_long_inc(&skb->dev->rx_dropped);
    kfree_skb(skb);
    return NET_RX_DROP;
}

References

Intel “How the kitchen sink and statistics explain and treat dropped packets”

Red Hat “Network Performance Tuning” PDF

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

redisnetworkperformance tuningLinuxNUMAInterruptsPacket Loss
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.