Merging VPC SNAT and Ops NAT Gateways for Faster Traffic Processing
This article analyzes why VPC SNAT and Ops NAT gateways overlap, proposes a unified gateway design using LPM and classified hash tables, discusses rule handling, performance trade‑offs, and optimization techniques, and concludes that the solution greatly improves NAT performance while preserving flexibility.
Background
The VPC SNAT gateway and the Ops NAT gateway have overlapping functions; after NetOps network redesign, bypass traffic no longer passes through a NAT gateway, rendering the previous bypass optimizations for the NAT gateway ineffective. Consequently, the Ops NAT gateway loses its independent necessity. To simplify maintenance and unify design, we decided to merge the two highly overlapping gateways into a single gateway that supports both VPC SNAT requirements and Ops address‑translation functions.
Solution
To merge the two gateways we first compare their similarities and differences. Both perform SNAT, so their configuration, basic functions, and deployment methods are largely the same. The key differences are examined in detail.
Deployment location is dictated by NetOps and the company’s network topology and cannot be changed; deployment mode can be decided by routing, where the next‑hop determines the egress interface, allowing either single‑arm or dual‑arm deployment for both gateways.
Routing announcements can use either Zebra or GoBGP with little overall impact. Configuration differs: the SNAT gateway uses the dpvstool command line for dynamic rule issuance, enabling hot‑load configuration, whereas the Ops NAT gateway relies on a custom loadconf file that usually requires cold reload, causing restarts and traffic switches.
Traffic handling differs: SNAT must process both overlay (VXLAN‑encapsulated) and underlay traffic, while Ops NAT only processes underlay traffic. Rule volume also differs: SNAT typically handles a few dozen rules, whereas Ops NAT must manage hundreds to thousands of rules, making rule‑matching performance critical.
Because linear‑search on large rule sets becomes a performance bottleneck, we need a lookup algorithm close to hash‑table performance. The Linux kernel uses a fib_trie (prefix tree) for routing tables, and DPDK provides an LPM (Longest Prefix Match) algorithm that fits our scenario.
Challenges with LPM:
Routing lookup ignores ports, but our SNAT rules include ports and protocols.
Routing lookup matches only a destination address, while SNAT may need to match both source and destination (Ops NAT only matches source).
DPDK LPM stores a 32‑bit next‑hop index, insufficient for a pointer that holds more rule data.
We address these issues as follows:
Port and protocol handling are simplified because both gateways currently use any ports and only handle TCP, UDP, and ICMP.
We create separate LPM trees for source and destination address prefixes, allowing three rule types: source‑only, destination‑only, and source‑and‑destination.
We store rules in an array; the LPM match returns the array index, avoiding pointer size limits and improving cache locality.
Rule insertion populates three hash tables: source‑match, destination‑match, and combined source‑destination match. Lookup proceeds through four cases: no hit, source hit, destination hit, and both hit. An example packet 10.192.0.122 → 11.211.188.3 would hit only the source table.
To traverse all rules (e.g., for control‑plane queries), we maintain a linked list alongside the hash tables.
Optimization Points
Hash‑table implementation can be improved:
Use a segmented array with sequential storage to eliminate hash calculations and keep entries in a single memory page, reducing cache misses.
Apply cuckoo hashing (available in DPDK) to achieve near‑constant‑time lookups even at high load factors.
Leverage DPDK memory pools for contiguous allocation, though they have fixed size limits.
Address‑overlap limitation can be mitigated by assigning each source prefix its own destination LPM tree, but this greatly increases memory consumption and algorithmic complexity, so it is not recommended.
Conclusion
The proposed design combines LPM lookup with classified hash tables, avoiding the drawbacks of sequential matching while keeping algorithmic complexity low. It significantly improves performance, removes the previous Ops NAT gateway’s mask‑length restrictions, and represents the optimal solution for the fused SNAT gateway design.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
