How 360’s Host‑Overlay VPC Architecture Boosts Private Cloud Performance
To meet growing business demands, 360’s virtualization team replaced the legacy overlay network with a host‑overlay VPC solution that decouples switches, supports private cloud isolation, leverages DPVS‑based gateways, and integrates monitoring, delivering high‑availability, scalable traffic handling across its 25 G data centers.
Background
As the company’s business expands, users demand more from the network. The rollout of 360’s 25 G data centers and upgraded switch architecture made the existing virtualized network unable to support VM migration across switches, and some special business lines required network isolation. The previous overlay network was tightly coupled with switches, leading to stability, operation, and troubleshooting issues.
The virtualization team therefore adopted a host‑overlay solution, decoupling from switches, supporting Virtual Private Cloud (VPC) isolation, ensuring resource security, and deploying it on the internal private‑cloud platform HULK, enriching HULK’s service catalog.
1.2 Architecture Idea
Within the private cloud, each IDC is treated as an independent VPC with VLAN isolation and three‑layer connectivity. Implementing VPCs allows logical partitioning of IP ranges, custom switches, routing tables, and inter‑VPC connectivity, essentially mapping physical IDC resources to virtual network constructs.
The team chose a host‑overlay approach, moving VXLAN decapsulation to the host side, similar to major cloud providers. OpenStack Neutron supports host‑overlay via OVS, combined with a Distributed Virtual Router (DVR) to eliminate the Neutron network node bottleneck. The solution uses DVR for east‑west traffic and a self‑developed virtualized gateway (based on DPVS) for north‑south traffic, achieving high single‑node performance and reducing host routing load.
Benefits
Business users can create custom VPCs, two‑layer isolated networks, and define IP ranges directly on the private‑cloud platform.
Network operators only need to configure switches for pure forwarding, reducing complexity.
Security teams can enforce ACL policies on the gateway or virtual network side, offering greater flexibility.
Developers no longer need to coordinate VLAN configurations with operators, speeding deployment from days to hours.
Drawbacks
The host‑side OVS decapsulation incurs a 20‑30% performance loss and roughly double the latency compared to VLAN or SR‑IOV passthrough.
Future Work
Plans include reducing overhead with OVS‑DPDK acceleration, smart‑NIC offload, and exploring DPU solutions that offload both network and storage I/O.
VPC Architecture Overview
The VPC architecture consists of two main parts: the virtualized network on compute nodes and the cloud gateway (SNAT, EIP, LB, CCN gateways). SNAT and EIP handle north‑south traffic, while LB provides load balancing and CCN enables inter‑VPC connectivity. Phase 1 has deployed SNAT and EIP gateways; other gateways are slated for Phase 2.
Control Plane
Neutron Server and OVS Agent manage the virtual network control plane. When a user binds an EIP in HULK, Neutron validates the request, records the mapping, and notifies the Fipctl Server and OVS Agent. The gateway control plane, implemented by Fipctl Server and Agent, stores mappings in an ETCD cluster and pushes updates to the forwarding plane.
Forwarding Plane
The forwarding plane uses OVS on compute nodes and a DPVD‑based high‑performance gateway. The gateway processes packets based on a five‑field mapping (EIP, VM IP, host IP, VNI, DVR MAC) synchronized across CPU cores, ensuring stateless, high‑availability forwarding.
Traffic Paths
East‑west traffic follows the DVR model within compute nodes. North‑south traffic is tunneled via VXLAN to the cloud gateway. The gateway performs stateless forwarding based on EIP‑VM mappings, enabling easy scaling and high availability.
Flow Analysis
Stateless forwarding simplifies scaling and fault tolerance.
Decoupling from switches reduces operational burden and improves stability.
Support for cross‑switch VM migration is achieved through VXLAN tunnels, avoiding broadcast storms.
Monitoring and Product Presentation
Monitoring is integrated via Exporter, Prometheus, and Grafana, providing alerts for hardware failures, CPU, disk, as well as VPC‑specific metrics such as EIP traffic, routing table size, and latency across data centers.
The HULK console now displays VPC features, including custom networks, switches, and elastic IPs, with upcoming support for security groups and custom routing tables.
Online Issues and Q&A
For K8S workloads, an elastic NIC with direct insertion is used to avoid double overlay overhead.
Large VPCs (e.g., 2000 VMs) experience slow EIP bind/unbind due to multiple database queries; consolidating queries reduces latency from ~3 minutes to ~10 seconds.
VPC forwarding performance is being improved with DPU, smart‑NIC offload, and OVS‑DPDK.
Conclusion
The VPC solution has been deployed across all major data centers for seven months with stable operation. Phase 1 provides a default elastic IP per VM; Phase 2 will introduce security groups and custom routing, while further enhancing gateway high‑availability.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.