Cloud Computing 11 min read

iQIYI Full‑Network Automatic Traffic Scheduling System: Architecture, Implementation, and Performance Evaluation

iQIYI’s SDN‑based full‑network automatic traffic‑scheduling system dynamically balances inter‑ and intra‑province traffic using BGP and policy routing, integrates monitoring, flow collection, DFS backup‑path calculation, and real‑time Kafka/Flink processing, cutting fault‑handling time to minutes and boosting link availability to 99.9999 % while preparing for programmable‑switch and SR‑based extensions.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Full‑Network Automatic Traffic Scheduling System: Architecture, Implementation, and Performance Evaluation

Facing high dedicated‑line costs and complex business requirements, iQIYI’s network operations team built an advanced traffic‑scheduling solution based on an SDN platform. The system automatically balances traffic across inter‑province and intra‑province links, improving bandwidth utilization while maintaining SLA guarantees.

The solution is divided into two parts: B1 (external‑network scheduling) and B2 (internal‑network scheduling). The B2 subsystem, which is the focus of this presentation, uses a hierarchical network consisting of a provincial backbone (core and access layers) and a city‑level metro network. BGP is deployed on all backbone and metro nodes, each operating in an independent autonomous system (AS) to provide a solid routing foundation for traffic control.

Traffic‑scheduling logic is straightforward: when traffic on a link exceeds a dynamic threshold, the excess is redirected to under‑utilized links, preserving service availability. Different routing strategies are applied based on network topology: BGP routing policies for the mesh‑type backbone and policy‑based routing for the simpler metro network.

The system is composed of five functional modules:

Monitoring & alarm (SNMP collection, alarm system)

Information collection (LLDP, ARP, BGP peer data, flow sampling)

Information processing (DFS algorithm for backup‑path calculation, IP‑priority mapping, BMP route ingestion)

Scheduling analysis (calculates target paths and traffic ranges based on business priority)

Scheduling dispatch (generates BGP policy configurations, pushes them to devices, validates deployment, and sends email notifications)

A detailed dispatch workflow includes: traffic anomaly detection, backup‑path selection, flow‑range calculation, configuration generation, device configuration verification, and result notification. The process repeats automatically while traffic remains abnormal and rolls back when the link recovers.

Key performance results after one year of operation:

Minute‑level automatic dispatch reduces fault‑handling time from hours to minutes.

Link availability improved from 99.524 % to 99.9999 % without additional cost.

Challenges addressed include:

Accuracy: Combining SNMP‑collected traffic with sFlow‑derived estimates, applying a scaling factor, and using DFS for precise backup‑path selection.

Real‑time processing: Migrating from Logstash to VFlow, using multi‑agent flow collection, Kafka pipelines, Flink stream processing, and Elasticsearch for sub‑minute query latency.

Security & reliability: Defining immutable routing policies, limiting repeated dispatches, and deploying distributed dispatch agents with RPC health checks to avoid single‑point failures.

Future work focuses on programmable switches and smart NICs to achieve business‑aware scheduling, extending the solution with SR‑based traffic engineering, and scaling the architecture with regional deployment of data‑processing units.

traffic schedulingReal-time MonitoringSDNBGPiQIYINetwork Automation
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.