Operations 19 min read

How to Accelerate OVS with Mellanox ConnectX‑6: Hardware Offload Design & Implementation

This article explains the challenges of using community OVS with Mellanox ConnectX‑6 NIC for hardware offload, details the design and implementation of Cx6 offload configuration, flow‑table match/action adaptation, Conn‑based large‑flow offload, and the coordination mechanism between hardware‑offloaded and software flow tables to achieve significant CPU savings while preserving security‑group and connection‑tracking correctness.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
How to Accelerate OVS with Mellanox ConnectX‑6: Hardware Offload Design & Implementation

Background

When using the community OVS with Mellanox ConnectX‑6 (Cx6) NIC for hardware offload, we encountered two problems: Cx6 does not support CT offload and does not fully support some Match/Action fields. Therefore we need to adapt OVS so that only the Established state flow entries are offloaded while keeping security‑group and CT processing in software.

Solution Design

Overall Design

The design includes four parts: Cx6 hardware offload configuration, flow‑table Match/Action adaptation, Conn‑based Established flow offload, and a coordination mechanism between hardware‑offloaded flow tables, software flow tables and CT aging.

Design diagram
Design diagram

Cx6 NIC hardware offload configuration

Enable SR‑IOV, create VFs, enable OVS offload, and attach the vDPA port to the bridge br-int. The main steps are:

Enable SR‑IOV and create VFs.

Enable OVS offload.

Attach the vDPA port to OVS.

Match/Action offload adaptation

Unsupported Match and Action

Cx6 does not support CT offload and the community OVS offload framework does not support all fields. Unsupported fields are stripped or replaced when generating offloadable flow tables.

VXLAN Decap offload path

The experimental API requires the compile option -DALLOW_EXPERIMENTAL_API and the NIC feature tunnel offload with dv_xmeta_en=3, which conflicts with RSS. The solution disables dv_xmeta_en and uses RTE_FLOW_ACTION_TYPE_VXLAN_DECAP directly.

Conn‑based large‑flow offload strategy

Security‑group rules stay in software; after a connection is established, only the Established state flow entries are offloaded. The inbound flow directs traffic to the vhost‑user interface, the outbound flow to the DPDK physical port. The ct(offload) action extracts the exact 5‑tuple of the Conn to achieve precise offload.

Conn offload diagram
Conn offload diagram

Hardware‑software coordination

After offloading, hardware‑offloaded flow tables are synchronized with software tables and CT aging. A periodic flow_offload_sync thread queries hardware statistics via rte_flow_query, updates software flow stats, and ensures Conn state is refreshed to prevent premature deletion.

Summary

Based on the ConnectX‑6 NIC we implemented a Conn‑granular large‑flow hardware acceleration mechanism that reduced OVS CPU usage from eight cores to two cores, saving about 75 % of CPU resources while preserving security‑group and CT correctness.

DPDKNetwork AccelerationOVSHardware offloadMellanoxConnectX-6
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.