Application of Programmable NICs in DiDi Cloud Network: Challenges, Solutions, and Open‑Source Contributions
DiDi Cloud’s network team replaced CPU‑bound OVS‑DPDK processing with programmable NICs, achieving sub‑150 µs VM‑to‑VM latency and up to 9.5 Mpps on 64‑byte packets, while overcoming offload, driver, and flow‑table challenges through extensive patches and open‑source contributions.
DiDi Cloud’s rapid expansion has raised strict latency and bandwidth requirements. In 2018 the network team introduced an OVS‑DPDK solution that bypasses the Linux kernel, uses hugepages and poll‑mode drivers, and achieves sub‑150 µs VM‑to‑VM latency with around 4 Mpps per core. However, CPU‑based DPDK processing can no longer keep up with the exploding traffic volume, prompting the exploration of programmable hardware.
Technical options evaluated include traditional ASIC cards, FPGA‑based programmable queues, P4‑based pipelines, and fully programmable NIC chips. Traditional ASICs are mature but inflexible; FPGAs require specialized expertise; P4 offers flexibility for gateway nodes but is unsuitable for compute nodes; programmable NICs provide match‑and‑action flow tables, packet modification, encapsulation, and offload capabilities that meet the need for rapid iteration.
Chosen solution leverages programmable NIC chips integrated into both compute and gateway nodes. A unified programming framework separates business logic from the data plane, using an OpenFlow‑style match‑action model. Flow‑table rules are offloaded via the OVS‑DPDK offload framework, with the first‑packet‑triggered rule submission to reduce flow‑table pressure.
Key challenges encountered :
Limited offload support in OVS‑DPDK (rte‑flow actions, meter offload, VXLAN handling).
Port‑forwarding restrictions on Mellanox NICs, solved with SR‑IOV + Hairpin.
OVS crashes when deleting meter‑action flow entries – fixed upstream in DPDK.
Decap/encap flow‑rule bugs in DPDK – patched upstream.
Hairpin performance degradation under high concurrency – driver fix provided by Mellanox.
Flow‑table size limits – mitigated by increasing the table capacity.
MAC‑address changes breaking VxLAN packets – addressed in kernel patches.
TC Flower pedit offload failures – traced to kernel driver bugs.
Performance results (vRouter on a 10 Gbps baseline):
Metric
64 B packets
1500 B packets
pps
9.5 Mpps
0.81 Mpps
Mbps
8.66 Gbps
10.07 Gbps
Latency measurements using pktgen‑dpdk latency show gateway delays of 3 µs under 1 Gbps and 6 µs under 5 Gbps with 100 k flow entries.
Open‑source contributions include more than 80 patches to OVS, DPDK, and the Linux kernel (e.g., flow‑table offload, meter offload, bug fixes). These patches have been submitted upstream and accepted, enhancing the broader community’s support for programmable NICs.
The DiDi SDN network team continues to develop high‑performance cloud networking features such as SLB load balancing, VPC services, elastic EIP, SNAT, cloud interconnect, and VxLAN‑based networking, all powered by the programmable‑chip platform.
Didi Tech
Official Didi technology account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.