Building an Intelligent, Reliable, Schedulable Backbone with Segment Routing
This article explains how UCloud leverages Segment Routing to redesign its backbone network, detailing the architecture, control‑plane and forwarding‑plane designs, traffic‑engineering features, multi‑scenario access, and future evolution to achieve intelligent, reliable, and highly schedulable connectivity.
Next‑Generation Backbone Architecture
Design goals:
Intelligent path computation: The controller collects real‑time device status and responds to path‑calculation requests.
Flexible all‑scenario access: Supports hybrid local‑line and Internet connections for flexible networking.
Multi‑dimensional SLA routing: Meets diverse business path requirements and ensures priority and QoS.
Reduced dedicated‑line cost: Removes VXLAN from the 2.0 backbone, cuts header overhead, and lowers line cost through intelligent traffic scheduling.
Traffic visualization and on‑demand scheduling: Uses telemetry and NetFlow to visualize traffic and schedule hot or noisy flows as needed.
The overall architecture consists of three major components: an intelligent controller, backbone‑edge forwarding PEs, and access‑side forwarding CPEs/VPEs.
Overall Architecture
Component Details
Controller: Unified resource management, information collection, configuration distribution, monitoring, alarm handling, and path‑calculation for all network devices.
Backbone‑edge: PE and RR form the SR‑TE core layer for multipath forwarding and traffic scheduling; they connect outward to CPE, M‑Core, VPE, and VCPE devices.
Access‑side: Includes CPE (local dedicated‑line customers), M‑Core (public‑cloud MAN core), and VPE (Internet/4G/5G‑based branch connectivity).
Control‑Plane Design
The control plane comprises two parts: the intelligent controller and the SR‑TE backbone routing control.
Intelligent Controller
The controller’s architecture is shown below:
It achieves consistent configuration distribution and scalable scheduling. Data collection is performed globally across regions, and controller nodes are clustered across data‑centers for reliability. The upper‑layer system gathers link‑state via BGP‑LS and telemetry, stores it in a database, and then pushes configurations and computed paths via NETCONF and PCEP.
Data collection includes:
Basic IGP topology (nodes, links, metrics)
BGP EPE information
Segment Routing data (SRGB, Prefix‑SID, Adj‑SID, Anycast‑SID, etc.)
TE link attributes (TE metric, delay, color affinity)
SR Policy details (head‑end, endpoint, color, segment list, BSID)
NETCONF and PCEP usage:
Head‑end requests path computation; controller computes and returns the path.
Head‑end learns the path from the controller via PCEP.
Head‑end reports its local SR Policy to the controller.
Key capabilities:
Fast fault response: Re‑computes the entire topology when a link or node fails.
Manual fault‑domain isolation: Isolates traffic at link and node levels.
Customizable path tuning: Directs customer traffic to any desired path.
Second‑level traffic monitoring: Detects customer‑level faults and provides path protection.
SR‑TE Backbone Control Plane
To satisfy L2 and L3 access scenarios, the backbone uses MP‑BGP‑based L3VPN and BGP‑EVPN L2VPN.
MP‑BGP: Deploys an IGP (IS‑IS) inside the MPLS‑VPN backbone, creates VRF instances on PEs, runs routing between PE‑CPE/VPE and M‑Core, establishes MP‑IBGP between PEs, and activates MPLS and SR on all core PEs.
BGP‑EVPN: Replaces VPLS to provide all‑active dual‑homed L2 connectivity. PE devices form EVPN instances, learn MAC addresses from CE, and distribute them via EVPN NLRI. Key EVPN concepts include EVPN Instance (EVI), Ethernet Segment Identifier (ESI), and Ethernet Tag (ET).
BGP‑LS: All PEs and RRs establish BGP‑LS sessions; PEs send link‑state and label information to RRs, which forward it to the controller.
Forwarding‑Plane Design
Backbone Core Layer
PE devices run ISIS‑L2 with SR enabled, allocating Node‑SID, Adj‑SID, and Anycast‑SID. They form MP‑IBGP with RRs to exchange VPNv4 routes for L3VPN traffic and use BGP‑EVPN for L2VPN, encapsulating user traffic in SR‑TE tunnels.
Two forwarding modes are defined:
SR‑BE: Two‑layer label stack (user VPN label inside, public SR label outside).
SR‑TE: Multi‑layer stack with an outer SR‑TE label assigned by the head‑end or controller.
Backbone Edge Layer
PEs connect to CPE/VPE and the public‑cloud M‑Core via eBGP, exchanging user routes. VPE‑VCPE connections over the Internet use IPsec for encryption.
Three Core Characteristics
Intelligent
Unified business orchestration: Automatic path calculation and traffic scheduling for branch devices.
Head‑end auto‑routing and diversion: Distributed control plane prevents full‑network outage; head‑ends auto‑divert traffic using CSPF constraints.
Network slicing by business scenario: SLA‑driven slicing (e.g., latency‑sensitive traffic).
Reliable
Global core‑node dedicated‑line network with 99.99% SLA.
Dual‑PE Anycast‑SID for ECMP and rapid disaster recovery.
Ti‑LFA loop‑free protection with sub‑50 ms switchover.
SR‑TE primary/backup segment‑list for high‑availability.
Fast fallback from SR‑TE to SR‑BE on failure.
Internet‑level backup network with Flex‑Algo for management traffic.
Schedulable
Per‑destination and per‑flow scheduling based on five‑tuple identification.
Multi‑type SR‑TE tunnels defined per application across regions.
Primary/backup paths for each tunnel with one‑click escape.
Algorithms consider delay, bandwidth, TCO, and public‑tunnel metrics.
Best‑Practice Flow Scheduling
Example: a traffic surge from Guangzhou to Jakarta originally routed via Hong Kong → Singapore caused congestion. Using SR‑TE, the controller defined a segment list that rerouted traffic through Beijing and Frankfurt, avoiding the congested link.
Advantages:
End‑to‑end path definition via labels.
Business‑aware traffic steering (destination + service class).
Future Evolution
Upcoming work includes binding‑SID‑based end‑to‑end traffic engineering for public‑cloud workloads, integrating data‑center VXLAN traffic into the backbone, and using MPLS labels to map tenant traffic to city‑level Binding SIDs.
Summary
UCloud’s next‑generation backbone is designed to be intelligent, reliable, and highly schedulable. By leveraging global dedicated lines, hybrid access, Segment Routing, and advanced traffic‑engineering mechanisms, it provides a stable, high‑performance foundation for public‑cloud services and user connectivity.
UCloud Tech
UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
