How 360 Qiyun Evolved Its VPC Architecture for Elastic Operations
This article details the evolution of 360 Qiyun's VPC solution, describing the two‑stage migration from a customized OpenStack Neutron Liberty deployment to a hardware‑assisted EVPN + VXLAN architecture, the specific network enhancements made, performance problems encountered, and the operational benefits achieved.
Introduction
At the IT Fun Learning Club technical salon, the 360 Qiyun team presented the evolution of its VPC architecture, outlining the motivations and overall design goals for a more flexible, high‑performance, and easier‑to‑operate cloud network.
Stage One
Based on the OpenStack Liberty release, the team customized Neutron to add QoS rate‑limiting, OSPF dynamic routing, and a private‑address scheme for external gateways, reducing public IP waste. They also transformed the external network from a flat L2 to a routed L3 design and introduced a special serviceip resource for management‑network connectivity.
Stage Two
To address scalability and performance limits, a major redesign introduced a hardware‑assisted solution that moves the VXLAN VTEP function from Open vSwitch to access switches, adopts EVPN + VXLAN for control and data planes, and uses VRF isolation for tenant routing.
Key Modifications
Extended Neutron API with QoS resources using Linux tc for traffic shaping.
Allocated private 172.16.0.0/16 addresses to external gateways to conserve public IPs.
Replaced the L2 external network with a three‑tier L3 design and added an OSPF driver on the neutron‑l3‑agent.
Created a serviceip resource to enable VM access to the internal management network.
Shifted VXLAN encapsulation to access switches, reducing CPU load on compute nodes.
Implemented VRF on switches for tenant‑level routing isolation.
Adopted EVPN (MP‑BGP) as the VXLAN control plane, eliminating the ML2 POP driver and reducing neutron‑server load.
Added a BGP driver on neutron‑l3‑agent for dynamic public IP announcements, replacing OSPF.
Implemented BGP‑based high‑availability for SNAT egress, offering faster failover than keepalived.
Problems Encountered
The original design suffered from poor performance of the ML2 POP driver, high CPU consumption from VXLAN encapsulation in Open vSwitch, and difficulty integrating bare‑metal or hardware appliances into tenant VPCs.
Conclusion
By integrating hardware EVPN + VXLAN and redesigning the network stack, 360 Qiyun achieved higher throughput, lower CPU usage, and simplified tenant isolation, while also improving user experience through asynchronous Netconf operations and added resource state tracking.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.