Operations 15 min read

How Flowtable Hardware Offload Boosted Linux Netfilter Performance by Up to 8×

This article details the development, testing, and patch contributions for Netfilter and Mellanox flowtable hardware offload, showing how offloading conntrack and NAT functions dramatically improves bandwidth and packet‑per‑second metrics while exposing remaining challenges such as neighbor handling and port mangling.

UCloud Tech
UCloud Tech
UCloud Tech
How Flowtable Hardware Offload Boosted Linux Netfilter Performance by Up to 8×

Background

Netfilter and Mellanox jointly introduced a flowtable hardware offload feature that implements a standard conntrack offload solution using the Linux Netfilter hardware offload interface. The feature, still new, contained several bugs and incomplete aspects.

Performance Improvements

After extensive hardware offload development, we measured bandwidth (bps), packets per second (pps), and CPU usage for both non‑offload and hardware‑offload modes.

Non‑hardware offload: single‑flow bandwidth 12 GBps using 2 host CPUs; multi‑flow small‑packet rate 1 Mpps using 8 host CPUs.

Hardware offload: single‑flow bandwidth 23 GBps with 0 host CPU; multi‑flow rate 8 Mpps with 0 host CPU.

CPU usage dropped to zero and performance increased up to 2× for bandwidth and 8× for pps, though new‑connection rate (cps) saw little change because it is handled in software.

Conntrack Offload Testing and Fixes

We built a test environment with a VM (10.0.0.75) behind a host VRF, connected to a physical peer (10.0.1.241). Initial tests showed successful connection establishment but no bandwidth increase due to missing neighbor entries. By pinging to create neighbor entries, hardware offload functioned correctly.

We submitted patches to fix block setup and block callback types:

Netfilter: nf_flow_table_offload: Fix block setup as TC_SETUP_FT cmd
Netfilter: nf_flow_table_offload: Fix block_cb tc_setup_type as TC_SETUP_CLSFLOWER

NAT Offload Issues and Fixes

Testing NAT offload revealed incorrect port mangling: DNAT and SNAT swapped source and destination ports incorrectly. The correct logic should be:

DNAT: original dst port → reply src port; reply src port → original dst port.

SNAT: original src port → reply dst port; reply dst port → original src port.

We contributed patches to fix Ethernet destination address and NAT port mangling:

Netfilter: nf_flow_table_offload: fix incorrect ethernet dst address
Netfilter: nf_flow_table_offload: fix the nat port mangle

Tunnel Offload Development

Flowtable hardware offload initially did not support tunnel devices, which are critical for SDN networks. By collaborating with Netfilter maintainer Pablo Neira Ayuso and Mellanox engineer Paul Blakey, we added tunnel offload support using indirect blocks.

Key steps included:

Registering callbacks for tunnel creation and linking tunnel devices to hardware via indirect blocks.

Adding tunnel match and encap/decap actions to the offload path.

Enabling indirect block support in the mlx5e driver.

Relevant patches:

Netfilter: flowtable: add tunnel match offload support
Netfilter: flowtable: add tunnel encap/decap action offload support
net/mlx5e: refactor indr setup block
net/mlx5e: add mlx5e_rep_indr_setup_ft_cb support

Verification

We validated the tunnel offload by deploying a VM (10.0.0.75) with an external address (2.2.2.11), creating a GRE tap tunnel (key 1000), and configuring firewall and NAT rules. Netperf tests confirmed that both connectivity and performance were fully functional, completing the SDN tunnel offload capability.

Conclusion

The series of patches and performance optimizations demonstrate how flowtable hardware offload can dramatically improve Linux networking performance and extend support to advanced features such as NAT and tunnel offload, paving the way for more stable and high‑throughput cloud‑native network infrastructures.

Linux kernelNATnetwork performancenetfilterflowtabletunnel offload
UCloud Tech
Written by

UCloud Tech

UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.