How China Telecom’s “Smart Cloud Network Brain” Drives Autonomous Operations
This article analyzes China Telecom’s cloud‑network self‑intelligence blueprint, outlines the Smart Cloud Network Brain solution and its four‑layer architecture, details AI‑enabled automation, OSS atomic capabilities, and presents a three‑step roadmap to achieve L3‑L5 autonomous operation levels by 2026.
Background
Since the 5G rollout in 2019, telecom networks have become a heterogeneous mix of 2G/3G/4G/5G, core, radio, transport, IP and cloud domains. The convergence of SDN/NFV accelerates network cloudification, making fault detection, root‑cause analysis and maintenance increasingly complex. Statistics from China Telecom’s ICNOC digital‑transformation meeting show that 75% of network problems are first discovered by users, 37% are caused by network changes, and operators spend about 90% of their time on fault handling.
Business Objectives and Target Metrics
The “maintenance digitalization” project aims to automate three core scenarios across 5GC, wireless, IP, OTN, cloud and IMS domains: event automatic handling , hazard automatic discovery and fault automatic isolation . The 2023‑2026 roadmap targets:
Event automatic handling rate ≥ 65% (2023), 80% (2024‑2025), 90%+ (2025‑2026).
Hazard discovery & closure rate ≥ 60% (2023), 75% (2024‑2025), 85%+ (2025‑2026).
Fault isolation rate ≥ 50% (2023), 65% (2024‑2025), 80%+ (2025‑2026).
Overall reduction of fault tickets and handling time by at least 10% each.
Solution Architecture
The Smart Cloud Network Brain is organized into a four‑layer architecture.
Collection Layer : Continuously gathers configuration, resource, performance, alarm, ticket and log data. Real‑time alarms are streamed via KAFKA; non‑real‑time data (configuration, performance, logs) are fetched periodically from file servers; ticket data are synchronized through WebService APIs.
General Capability Layer : Provides reusable services such as a workflow engine, GIS components, topology engine and AI model training/inference services. This layer exposes generic APIs for alarm correlation, KPI‑based performance thresholds and traffic forecasting.
Core Capability Layer : Implements alarm processing, performance monitoring, AI‑driven fault/hazard detection, intelligent analysis, decision‑making and automated remediation. Java and Python code can be embedded to extend processing logic.
Cloud‑Network Application Layer : Delivers visual dashboards (alarm streams, performance matrices, topology views, event‑centric monitoring) and a unified network overview for operators.
AI‑Enabled Functions
Historical performance, alarm, log and traffic data are used to train several AI models:
Fault‑prediction model : predicts imminent network failures based on KPI trends.
Traffic‑forecasting model : estimates future traffic load to guide capacity planning.
Hazard‑mining model : discovers hidden risk patterns from multi‑domain logs.
A knowledge graph built from past tickets and expert experience encodes fault‑resolution rules, similarity metrics and root‑cause patterns. During an event, the system queries the graph to generate recommended remediation steps.
OSS Atomic Capability Open Platform
More than 45 atomic capabilities (grouped as P1‑P9) are exposed as low‑code services on a Spring Cloud micro‑service platform. Each capability corresponds to a concrete OSS function such as alarm query, topology lookup, resource‑relation extraction, pre‑processing rule execution or remote command dispatch. By composing these capabilities in a workflow, complex maintenance scenarios can be automated without writing extensive code.
Example – transmission line fault automatic handling:
1. Query hardware fault status (P1)
2. Retrieve related resource information (P2)
3. Pull alarm from the monitoring system (P3)
4. Obtain recent cut‑over records (P8)
5. Execute remote OMC/EMS command to reset the line
6. Generate a fault ticket and notify operatorsSimilar compositions using P4‑P7 automate hazard discovery, while P5‑P9 support fault isolation across core, wireless and IP domains.
Solution Evolution Roadmap
Step 1 (2023‑2024) : Deploy automation for 45+ maintenance scenarios, achieving the 2023 target rates (65% event handling, 60% hazard discovery, 50% fault isolation).
Step 2 (2024‑2025) : Introduce AI‑driven decision support to raise automation to 80%/75%/65% respectively.
Step 3 (2025‑2026) : Reach autonomous level L4, with AI applied across all professional domains (5GC, wireless, IP, OTN, IMS, cloud) and complete digital transformation of cloud‑network operations.
Expected Impact
By integrating OSS atomic capability orchestration, AI‑enabled fault prediction and end‑to‑end automation, the platform is expected to reduce OPEX, shorten fault‑resolution cycles, and lift China Telecom’s self‑intelligence level from L3 to L4 (and eventually L5) across the entire network.
AsiaInfo Technology: New Tech Exploration
AsiaInfo's cutting‑edge ICT viewpoints and industry insights, featuring its latest technology and product case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
