How to Detect and Auto‑Block Cloud Data‑Center Network Anomalies with SDN
This article describes how a mobile‑cloud operations team built a low‑cost, high‑efficiency system using SDN flow‑table analysis, Zabbix monitoring, and automated OpenStack API integration to detect, alert, and block internal and external network anomalies in a large cloud data‑center, dramatically reducing security tickets and response time.
Abstract: Cloud data‑center network anomalies impose heavy load on equipment and degrade user experience. Shared resources and diverse tenant workloads make detection difficult.
The Southern Base mobile‑cloud network operations team extracted features and devised a “combined coarse‑and‑fine flow‑table analysis” strategy, achieving low‑cost, high‑efficiency anomaly discovery. Based on DevOps, they built an automated detection and blocking system.
Introduction
In the group’s large‑scale connectivity strategy, cloud‑management services are key. Mobile Cloud (ecloud.10086.cn) is a headquarters‑level public cloud with 2400 physical hosts and 450 network devices. It uses OpenStack + SDN, with overlay networks built on top of the underlay.
Because many tenants share the same network infrastructure, analyzing traffic and behavior at the network layer is increasingly challenging.
Customer complaints
Network‑related complaints constitute a large portion of tickets, including password changes, DDoS attacks, and suspected malicious activity. Traditional analysis requires expensive tools and extensive manpower.
Device load and poor perception
Undetected anomalies can cause virus spread and overload network devices, leading to outages and degraded service quality.
Nature of the Problem
Network anomalies are classified as external attacks and internal attacks. Internal attacks pose greater risk and are harder to detect.
Customers often lack security awareness, misconfiguring firewalls or security groups, leading to compromised VMs that become attack sources.
Existing security designs focus on the data‑center edge, leaving internal protection weak.
Traditional flow‑capture systems are costly and cannot handle the massive internal traffic of modern data centers.
Measure
We built a comprehensive detection system covering both virtual‑switch flow tables and core/egress traffic. The “combined coarse‑and‑fine flow‑table” strategy enables low‑cost, high‑efficiency anomaly detection.
SDN environment
Mobile Cloud adopts SDN with OpenFlow flow tables and VXLAN overlay, enabling automated deployment and rapid configuration.
Figure 1: SDN cloud data‑center architecture.
Dual‑source detection
We analyze flow tables from virtual switches on hosts and traffic from core/egress points, achieving full coverage.
Virtual‑switch flow‑table analysis
Virtual switches (OVS) act as the first line of access. By examining flow‑table entries, we can reconstruct VM sessions and identify anomalies such as excessive flow counts or asymmetric directions.
Figure 2: OVS diagram.
Sample flow‑table data (Table 1) shows entries per VM session.
Excessive flow count: >10,000 flows/s per VM.
Asymmetric flow direction: >2,000 outbound flows/s but <50 inbound flows/s.
We use Zabbix to set alerts on flow‑count thresholds, rate changes, and send/receive ratios.
Core and egress traffic analysis
Port mirroring captures traffic at core and egress layers. Key features include packet send/receive ratio and TCP SYN scan detection.
Figure 3: Abnormal send/receive ratio.
Figure 4: SYN‑scan packets.
Intelligent Automated Handling
We integrated Zabbix and a proprietary monitoring system to automatically collect flow‑table and traffic data, generate alerts, and invoke OpenStack APIs to block malicious traffic.
Figure 5: Automated handling workflow.
Blocking is performed at the virtual‑switch level via OpenFlow rules, ensuring precise isolation without affecting other tenants.
Figure 6: Automated flow‑blocking.
Achievements
After deployment, weekly security tickets dropped from 13 to about 2 (80% reduction), and average handling time fell from 8 hours to under 30 minutes. The flow‑table matching approach also respects tenant privacy while providing actionable insights for capacity planning and product optimization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
