Design and Implementation of Cisco Nexus VPC for Qunar K8S Network
This article details the background, design rationale, network topology changes, and step‑by‑step procedures—including VPC configuration, BGP setup, and port‑channel adjustments—used to upgrade Qunar's data‑center network for Kubernetes deployments, with practical code examples and operational tips.
The author, a senior network operations engineer at Qunar, introduces the need to modernize the IDC and backbone transport network to support Kubernetes (K8S) workloads.
1. Background and redesign plan
1.1 Qunar K8S network overview – K8S nodes run iBGP with access switches, which announce POD subnets to core switches via eBGP; core switches provide default routes only to access switches.
1.2 Cisco Nexus VPC overview – Virtual Port‑Channel (VPC) aggregates two switches into a single logical device, eliminating L2 loops, improving bandwidth, and simplifying L3 routing.
1.3 Why VPC redesign – Existing L2 access switches cannot run BGP with K8S nodes; VPC + HSRP enables three‑layer forwarding, increases uplink bandwidth, transitions L2 to more stable L3, and reduces rack count for cost savings.
1.4 Network topology before and after – Before: two independent L2 access switches connect to a VPC core via separate port‑channels; servers bond to both access switches. After: access switches form a VPC, L3 interconnect with core switches via eBGP, and K8S servers use the access switch as gateway.
1.5 Redesign approach – Redirect traffic, configure VPC on a non‑traffic switch, restart to trigger auto‑recovery, migrate server traffic, establish L3 links, and configure VLAN interfaces and BGP on access switches.
2. Detailed migration steps
2.1 Plan VPC primary/backup roles – Use priority settings to control role assignment.
2.2 Maintain the primary VPC switch – Divert traffic, shut down uplink on the core.
2.3 Shut down server uplink on the primary switch – Prevent accidental bond failover during maintenance.
2.4 Configure VPC on the primary switch
**Rack1
feature vpc
vpc domain XX
peer-switch
role priority 4096
peer-keepalive destination x.x.x.x source y.y.y.y
delay restore 150
auto-recovery
ip arp synchronize
interface port-channel4002
switchport
switchport mode trunk
spanning-tree port type network
vpc peer-link
interface Ethernet1/51
switchport
switchport mode trunk
channel-group 4002 mode active
shutdown
interface Ethernet1/52
switchport
switchport mode trunk
channel-group 4002 mode active
shutdown2.5 Save configuration and reboot the primary switch – After reboot, VPC becomes active; timers (delay‑restore 10 s, 150 s, auto‑recovery 240 s) total ~400 s before the VPC reaches Primary state.
2.6 Configure VPC ID on uplink port‑channel and enable core uplink
**Rack1
conf t
interface port-channel1
vpc 1
switchport trunk allowed vlan except xx
**core[1-2]
conf t
interface port-channel9
switchport trunk allowed vlan except xx
#检查接口下的 trunk allowed vlan 配置
#打开接口:
**core[1-2]
conf t
interface Ethernet1/9
no shutdown
#检查Rack1的 port-channel和vpc 1状态,检查STP2.7 Switch server traffic back to the primary VPC switch – Verify interface status before and after reboot.
2.8 Shut down backup switch uplink on the core
2.9 Configure VPC on the backup switch
**Rack2
feature vpc
vpc domain XX
peer-switch
peer-keepalive destination y.y.y.y source x.x.x.x
delay restore 150
auto-recovery
ip arp synchronize
interface port-channel4002
switchport
switchport mode trunk
spanning-tree port type network
vpc peer-link
interface Ethernet1/51
switchport
switchport mode trunk
channel-group 4002 mode active
no shutdown
interface Ethernet1/52
switchport
switchport mode trunk
channel-group 4002 mode active
no shutdown
interface port-channel1
vpc 1
switchport trunk allowed vlan except xx2.10 Connect the peer‑link and bring up interfaces (causing one STP bounce) – After enabling the peer‑link, the primary switch becomes "primary" and the backup "secondary"; after the 400 s timer both achieve normal VPC status.
2.11 Add Rack2 uplink to the same port‑channel as Rack1 on the core
**core[1-2]
conf t
interface Ethernet1/35
switchport trunk allowed vlan except xx
channel-group 9 mode active # channel group 9 is the uplink port's channel group ID, after this modification the primary and backup switches connect to the same VPC2.12 Enable backup switch uplink – VPC migration completed
2.13‑2.14 Add L3 interconnect and EBGP/IBGP configurations between access and core switches and between access switches and K8S servers
3. Summary and precautions
Avoid VPC Domain ID conflicts across multiple access switch groups.
The L2 network will experience two STP disruptions during migration.
After migration, both access switches share a single port‑channel to the core.
VPC does not become active immediately after configuration; a reboot is required to trigger auto‑recovery.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.