Fundamentals 8 min read

How NUMA‑Aware MPTCP Flow Selection Boosts Throughput and Cuts Latency

At Netdev 0x19, ByteDance's STE team presented two talks—one on a NUMA‑locality‑aware MPTCP flow‑selection strategy that can raise throughput by up to 30% and lower tail latency by 6%, and another on a DPDK‑based user‑space MPTCP stack that reduces latency by nearly 10% and more than doubles throughput—showcasing practical performance gains for data‑center networking.

ByteDance SYS Tech

Mar 7, 2025

How NUMA‑Aware MPTCP Flow Selection Boosts Throughput and Cuts Latency

Netdev 0x19 Overview

Netdev 0x19, a premier Linux networking conference, was held on March 10 in Croatia, gathering experts, researchers, and industry representatives to discuss cutting‑edge developments and future trends in network technology.

Talk 1: NUMA‑Aware MPTCP Flow Selection Optimization

The ByteDance STE team presented a new MPTCP sub‑flow selection strategy that dynamically prefers network interfaces located on the same socket as the receiving application process. Traditional MPTCP path selection only considers TCP‑level metrics and ignores the additional latency introduced by application‑level system calls.

By incorporating end‑to‑end metrics, the proposed method improves application throughput and latency. Experimental results on a Redis memory‑hierarchy benchmark show up to 30% higher throughput and 6% lower tail latency compared with the default MPTCP configuration. The approach also reduces cross‑NUMA traffic, balances load across NICs, and can benefit from reduced contention on system bandwidth, memory, and I/O.

Talk 2: User‑Space DPDK‑Based MPTCP Stack for Data Centers

The team demonstrated an innovative user‑space MPTCP implementation built on DPDK, targeting storage and high‑performance computing workloads in data centers. The stack follows RFC 8684, interoperates with kernel MPTCP, and automatically falls back to standard TCP when MPTCP negotiation fails, facilitating seamless migration of existing TCP applications.

The stack consists of two main modules:

Sub‑flow Management : Handles creation, destruction, and address notification of sub‑flows, leveraging NIC flow bifurcation and DPDK poll‑mode drivers to achieve lock‑free forwarding while fully exploiting multi‑core processing.

Sub‑flow Scheduling : Implements various scheduling policies to meet diverse performance requirements.

A zero‑copy interface further eliminates copy overhead between the application and the stack, boosting both throughput and latency.

Preliminary performance tests inside a data‑center environment show that, compared with a user‑space TCP stack, the user‑space MPTCP stack achieves nearly 10% lower latency and over 100% higher throughput for typical packet sizes (~1000 bytes), with significant tail‑latency improvements under loss conditions.

About the STE Team

The System Technologies & Engineering (STE) team at ByteDance focuses on operating‑system kernels, virtualization, foundational system libraries, large‑scale data‑center reliability, and co‑design of new hardware and software, actively contributing to open‑source communities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization DPDK numa MPTCP Data Center Networking

Written by

ByteDance SYS Tech

Focused on system technology, sharing cutting‑edge developments, innovation and practice, and analysis of industry tech hotspots.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.