How User‑Space MPTCP with DPDK Doubles Throughput in Data Centers
This article details the design, implementation, and performance evaluation of a user‑space MPTCP stack built on DPDK, showing how a layered, zero‑copy architecture and same‑core lock‑free forwarding can boost data‑center throughput by up to 100% while reducing latency by about 10%, all while remaining compatible with existing TCP applications.
Background
MPTCP (Multipath TCP) extends TCP to use multiple network paths within a single connection, first deployed at scale in Apple’s Siri service to improve mobile user experience. As data‑center workloads grow, traditional single‑path TCP struggles to meet high‑throughput, low‑latency, and high‑reliability requirements.
Why MPTCP?
Service continuity : Redundant paths enable rapid fault detection and switchover.
Bandwidth maximization : Aggregates multiple physical links to fully utilize available bandwidth.
Legacy application compatibility : Smooth fallback to TCP for existing services.
Kernel support : Native support in Linux 5.6 and later.
Why User‑Space with DPDK?
High‑performance packet processing : DPDK provides kernel‑bypass, zero‑copy, polling, and batch processing.
Flexibility : Custom scheduling, congestion control, and path selection without kernel changes.
Scalability : Multi‑core concurrency and easy integration with SR‑IOV in virtualized environments.
Design
Overall Architecture
The system uses a layered, decoupled design:
Separate MPTCP/TCP stack deployed as a dedicated service, supporting multiple applications.
Zero‑copy interaction with applications via shared memory.
API exposed as an SDK integrated into application processes.
Decoupled MPTCP from TCP, enabling a plug‑in user‑space TCP stack.
Flow bifurcation or SR‑IOV directs traffic to appropriate NIC queues.
Key Components
Socket Manager : Manages MPTCP connection contexts and maps them to underlying TCP contexts.
Path Manager : Handles sub‑flow lifecycle, creation, deletion, and address advertisement.
Packet Scheduler : Selects the sub‑flow for data transmission (default round‑robin).
Key Features
1. Plug‑in User‑Space TCP Stack
Implements a TCP adaptation layer that decouples the user‑space TCP stack, allowing it to be integrated as a plug‑in within MPTCP and switched per scenario.
2. Same‑Core Lock‑Free Forwarding
Configures NIC flow rules so that all sub‑flows of an MPTCP connection are processed by the same PMD (Poll Mode Driver), achieving lock‑free forwarding without context sharing.
3. Kernel Compatibility
Both user‑space and kernel MPTCP fall back to their respective TCP stacks when the peer does not support MPTCP, ensuring strict RFC compliance.
Performance Evaluation
Test Environment
Two compute nodes in different data centers.
Base network latency ~10 ms.
Each MPTCP connection establishes 3+ sub‑flows.
Maximum inter‑node bandwidth limited to 1 Gbps.
Results
Only user‑space TCP vs. user‑space MPTCP comparisons are shown (averaged over >10 runs).
1. Throughput
With 3 sub‑flows, user‑space MPTCP achieves >100 % higher total throughput than single‑path user‑space TCP for 1000‑byte packets; for 10000‑byte packets the gain is ~26 % due to bandwidth limits.
2. Latency
In most scenarios, user‑space MPTCP reduces latency by ~10 % compared to user‑space TCP.
3. Packet Loss Scenarios
Under 1 % and 3 % loss, single‑path TCP suffers long‑tail latency, while multi‑path MPTCP maintains low latency.
4. High‑Latency Links
Even with large link delays, MPTCP still provides latency benefits, though limited by the current round‑robin path selection.
Outlook & Planning
Technical Evolution
More intelligent path‑scheduling algorithms, including ML‑based predictions.
Support for additional user‑space TCP stacks.
Fine‑grained resource control with QoS tiers for compute, storage, and AI workloads.
Continuous performance tuning to eliminate regressions in high‑performance scenarios.
Ecosystem Building
Open‑source release to gather community feedback.
Participation in IETF drafts to standardize user‑space MPTCP.
Conclusion
The user‑space MPTCP stack demonstrates significant bandwidth utilization improvements and latency reductions for data‑center networks, while preserving compatibility with existing TCP applications and supporting large‑scale, incremental deployments. Ongoing work will further optimize performance and expand applicability.
ByteDance SYS Tech
Focused on system technology, sharing cutting‑edge developments, innovation and practice, and analysis of industry tech hotspots.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.