How Inner Mongolia Mobile Achieved Leading SRE Maturity – Lessons from the DevOps Assessment
The article explores the growing importance of system reliability in China, the national regulations driving SRE adoption, Inner Mongolia Mobile’s successful Level‑3 SRE assessment at the 2024 GOPS conference, and insights from Deputy GM Zhang Yongtao on practices, challenges, and future plans.
With rapid digital technology updates, the importance of information systems and their stability has become a national priority. The "Key Information Infrastructure Security Protection Regulation" (effective September 1, 2021) requires operators to ensure safe and stable operation of critical infrastructure, and the industry increasingly adopts System Reliability Engineering (SRE) practices.
On October 18, 2024, the 24th GOPS Global Operations Conference and Research‑Operation Intelligence Summit took place in Shanghai, where the China Academy of Information and Communications Technology (CAICT) announced the latest DevOps standard evaluation results. Inner Mongolia Mobile’s "Order Center" passed the domestic DevOps standard and achieved a Level‑3 assessment in System Reliability and Continuity Engineering (SRE), demonstrating leading domestic capability.
Q&A
Q: Please introduce yourself, your company, and the project you evaluated.
A: Inner Mongolia Mobile, a wholly‑owned subsidiary of China Mobile, provides communication and IT services to millions of users and enterprises. The Order Center, a core sub‑system for business support, has been containerized and deployed independently to offer standardized order services for personal, family, enterprise, and IoT businesses.
Q: How do you feel about passing the SRE Level‑3 assessment?
A: We are delighted and grateful for the guidance from CAICT and support from leadership. The assessment has driven automation, AI‑assisted operations, and improvements in SLO management, fault prevention, observation, handling, and multi‑system coordination, resulting in higher quality and efficiency of IT operations.
Q: Why is building a stable and reliable IT system important for enterprise development?
A: A stable IT system underpins business growth by ensuring service quality, supporting brand image, fulfilling social responsibility, promoting business expansion, and reducing costs through automation and intelligent operations.
Q: Why did you choose to participate in this SRE assessment?
A: To align with the group’s goal of becoming a world‑class information service technology company and to advance autonomous, controllable IT systems. Our large‑scale, cloud‑native micro‑service architecture required SRE practices to maintain reliability and efficiency.
Q: What changes did the assessment bring to your organization?
A: It optimized the SRE team structure, fostered a collaborative culture, streamlined processes, and enhanced tool‑based operations. We improved observability, rapid incident response, automated inspections, and chaos engineering, achieving a 60% reduction in average fault recovery time and a 70% drop in fault occurrences compared with 2023.
Q: What challenges did you encounter during the assessment and how were they solved?
A: Challenges included breaking down silos across multiple IT lines and integrating diverse tools. We clarified the SRE organizational architecture, promoted a culture of cooperation, allocated resources for cross‑team collaboration, and conducted multiple rounds of system and tool refactoring.
Q: What successful experiences can you share for implementing SRE internally?
A: Establish a clear SRE organization, focus on automation over manual work, and prioritize SLO operation, observability, and chaos engineering as key transformation levers.
Q: What are your future plans for SRE?
A: We will enhance inter‑system tool integration, standardize the SRE toolchain, and further develop cross‑departmental management mechanisms and culture.
Q: How do you view the future direction and trends of SRE?
A: SRE is the best practice for large‑scale IT production systems across industries such as telecom, finance, energy, and government, driving cost reduction and efficiency gains in the era of digital transformation.
Additional project screenshots illustrate the SRE tools and dashboards used in the assessment.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.