Operations 11 min read

Design and Implementation of Cisco Nexus VPC for Qunar K8S Network

This article details the background, design rationale, network topology changes, and step‑by‑step procedures—including VPC configuration, BGP setup, and port‑channel adjustments—used to upgrade Qunar's data‑center network for Kubernetes deployments, with practical code examples and operational tips.

Qunar Tech Salon

Mar 10, 2022

Design and Implementation of Cisco Nexus VPC for Qunar K8S Network

The author, a senior network operations engineer at Qunar, introduces the need to modernize the IDC and backbone transport network to support Kubernetes (K8S) workloads.

1. Background and redesign plan

1.1 Qunar K8S network overview – K8S nodes run iBGP with access switches, which announce POD subnets to core switches via eBGP; core switches provide default routes only to access switches.

1.2 Cisco Nexus VPC overview – Virtual Port‑Channel (VPC) aggregates two switches into a single logical device, eliminating L2 loops, improving bandwidth, and simplifying L3 routing.

1.3 Why VPC redesign – Existing L2 access switches cannot run BGP with K8S nodes; VPC + HSRP enables three‑layer forwarding, increases uplink bandwidth, transitions L2 to more stable L3, and reduces rack count for cost savings.

1.4 Network topology before and after – Before: two independent L2 access switches connect to a VPC core via separate port‑channels; servers bond to both access switches. After: access switches form a VPC, L3 interconnect with core switches via eBGP, and K8S servers use the access switch as gateway.

1.5 Redesign approach – Redirect traffic, configure VPC on a non‑traffic switch, restart to trigger auto‑recovery, migrate server traffic, establish L3 links, and configure VLAN interfaces and BGP on access switches.

2. Detailed migration steps

2.1 Plan VPC primary/backup roles – Use priority settings to control role assignment.

2.2 Maintain the primary VPC switch – Divert traffic, shut down uplink on the core.

2.3 Shut down server uplink on the primary switch – Prevent accidental bond failover during maintenance.

2.4 Configure VPC on the primary switch

**Rack1
feature vpc
vpc domain XX
  peer-switch
  role priority 4096
  peer-keepalive destination x.x.x.x source y.y.y.y
  delay restore 150
  auto-recovery
  ip arp synchronize

interface port-channel4002
  switchport
  switchport mode trunk
  spanning-tree port type network
  vpc peer-link

interface Ethernet1/51
  switchport
  switchport mode trunk
  channel-group 4002 mode active
  shutdown

interface Ethernet1/52
  switchport
  switchport mode trunk
  channel-group 4002 mode active
  shutdown

2.5 Save configuration and reboot the primary switch – After reboot, VPC becomes active; timers (delay‑restore 10 s, 150 s, auto‑recovery 240 s) total ~400 s before the VPC reaches Primary state.

2.6 Configure VPC ID on uplink port‑channel and enable core uplink

**Rack1
conf t
interface port-channel1
  vpc 1
  switchport trunk allowed vlan except xx

**core[1-2]
conf t
interface port-channel9
  switchport trunk allowed vlan except xx
#检查接口下的 trunk allowed vlan 配置
#打开接口：
**core[1-2]
conf t
interface Ethernet1/9
  no shutdown
#检查Rack1的 port-channel和vpc 1状态，检查STP

2.7 Switch server traffic back to the primary VPC switch – Verify interface status before and after reboot.

2.8 Shut down backup switch uplink on the core

2.9 Configure VPC on the backup switch

**Rack2
feature vpc
vpc domain XX
  peer-switch
  peer-keepalive destination y.y.y.y source x.x.x.x
  delay restore 150
  auto-recovery
  ip arp synchronize

interface port-channel4002
  switchport
  switchport mode trunk
  spanning-tree port type network
  vpc peer-link

interface Ethernet1/51
  switchport
  switchport mode trunk
  channel-group 4002 mode active
  no shutdown

interface Ethernet1/52
  switchport
  switchport mode trunk
  channel-group 4002 mode active
  no shutdown

interface port-channel1
  vpc 1
  switchport trunk allowed vlan except xx

2.10 Connect the peer‑link and bring up interfaces (causing one STP bounce) – After enabling the peer‑link, the primary switch becomes "primary" and the backup "secondary"; after the 400 s timer both achieve normal VPC status.

2.11 Add Rack2 uplink to the same port‑channel as Rack1 on the core

**core[1-2]
conf t
interface Ethernet1/35
  switchport trunk allowed vlan except xx
  channel-group 9 mode active       # channel group 9 is the uplink port's channel group ID, after this modification the primary and backup switches connect to the same VPC

2.12 Enable backup switch uplink – VPC migration completed

2.13‑2.14 Add L3 interconnect and EBGP/IBGP configurations between access and core switches and between access switches and K8S servers

3. Summary and precautions

Avoid VPC Domain ID conflicts across multiple access switch groups.

The L2 network will experience two STP disruptions during migration.

After migration, both access switches share a single port‑channel to the core.

VPC does not become active immediately after configuration; a reboot is required to trigger auto‑recovery.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Network BGP VPC Cisco DataCenter

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.