Zhejiang Mobile’s AI‑Driven Self‑Healing: Pioneering Intelligent Network Operations
This article examines the challenges of intelligent telecom network operation, presents Zhejiang Mobile’s AI‑powered self‑healing practice—including process re‑design, system reconstruction, talent transformation, and measurable results—and outlines the AIOps maturity model and future outlook for digital network management.
Challenges of Intelligent Network Operations
Telecommunication networks consist of wireless, transport, IP, core and service systems. Their evolution creates mixed generations (2/3/5G) and multi‑vendor environments, while demanding 99.999% reliability and high security. The main challenges are:
Low data standardization : alarm, log, performance and resource data have inconsistent formats, and device data quality varies.
Insufficient fault samples : high reliability means the same fault occurs rarely, limiting AI training data.
High trial‑and‑error cost : automation cannot be introduced abruptly due to risk‑averse reliability requirements.
Poor end‑to‑end capability : traditional automation relies on vendor‑specific functions, leading to fragmented and domain‑specific operations.
Complex network coordination : coexistence of multiple radio standards and cloud‑based core equipment creates intricate, multi‑layer dependencies.
Zhejiang Mobile Practice Solution
To overcome these issues, Zhejiang Mobile re‑engineered its operations around a “force matrix” for fault self‑healing, addressing process, system and talent dimensions.
Process Re‑design
Traditional manual workflows that depend on human operators and expert knowledge were transformed into machine‑driven automatic perception, analysis, decision and execution loops.
System Reconstruction
A platform was built to aggregate existing automation capabilities and codify expert knowledge into a reusable “operation capability set”. These capabilities are orchestrated into “force chains” and combined into a “force matrix” that can automatically match fault scenarios and execute the appropriate actions.
Platform features
Modular capability construction : extract and standardize automation functions from applications, platforms or human expertise, forming a library of network operation assets.
AI‑enhanced capabilities : use dynamic thresholds, time‑series prediction and multi‑dimensional data correlation to improve perception and analysis.
End‑to‑end orchestrable automation : assemble capabilities across perception, analysis, decision and execution to cover diverse fault scenarios.
Human‑in‑the‑loop support : combine ChatOps or manual hand‑over with automation to gradually achieve full‑process automation.
Agile fault tolerance : automated processes can notify operators and allow manual takeover when exceptions occur.
Talent Transformation
Operations staff were retrained into “digital‑intelligent” roles, becoming operation designers and orchestration engineers who develop and compose automated capabilities, shifting from knowledge‑transfer to digital knowledge‑inheritance.
Results
Zhejiang Mobile has accumulated 241 force matrices covering 98% of network and service fault scenarios, with 1,236 automated capabilities (including KPI anomaly detection, RCA recommendation, one‑click service keep‑alive). The system achieves 100% automatic fault dispatch and 75% automatic fault resolution.
Future Outlook
Intelligent operation remains a complex, ongoing activity. Zhejiang Mobile will continue to standardize data and capabilities, scale AI applications, and strengthen organizational support to accelerate the transition toward a high‑level self‑intelligent network.
AIOps System and Tool Evaluation
In 2021, China Mobile Zhejiang passed the “Cloud Computing Intelligent Operation (AIOps) Capability Maturity Model – Part 2: System and Tool Technical Requirements” assessment, receiving top‑level scores for fault prediction, anomaly detection and alarm convergence.
The AIOps maturity model, jointly developed by the China Academy of Information and Communications Technology and industry partners, defines eight evaluation modules: anomaly detection, fault prediction, alarm convergence, root‑cause analysis, fault self‑healing, fault prevention, capacity prediction, and knowledge‑base construction. Enterprises can select any modules for assessment.
Benefits of the evaluation include self‑inspection, evidence‑based improvement, and positioning as a leading intelligent operation platform.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.