Building Scalable Operations: From SRE to AIOps and DevOps
This article explores how to construct a scalable operations framework by integrating concepts such as SRE, DevOps, AIOps, and continuous improvement, addressing organizational challenges, process standardization, tool automation, and the shift from reactive firefighting to proactive, value‑driven management.
Preface
After a long hiatus, the author reflects on the original purpose of the public account: to consolidate scattered work knowledge into a coherent operations knowledge system.
The goal is to build a sustainable, extensible operations framework that integrates organization, processes, and tools.
1.1 Operations Is Not Simple
Operations is often misunderstood as merely deploying programs or handling simple tasks, but it actually requires comprehensive technical and management capabilities, encompassing methods, tools, processes, and documentation.
Key capability areas include:
Implementation of operational standards : applying ITIL, ISO20000, ITSS.1, etc.
Regulatory compliance : understanding and responding to regulator requirements.
Basic assurance : configuration, monitoring, release, scaling, incident and problem management.
Fundamental skills : networking, servers, OS, databases, middleware, JVM, application tuning.
Business service capability : SLA, service desk, knowledge base, support.
Availability management : inspections, high‑availability design, redundancy.
Risk and security management : operation audit, vulnerability and attack control.
Incident management : event and problem handling.
Continuous delivery : application changes and infrastructure delivery.
Proactive optimization : architecture, performance, user experience.
Emergency drills : high‑availability architecture, failure scenarios, documentation, personnel readiness.
Business support : data maintenance, extraction, parameter management.
Operational analysis : capacity, performance, availability analysis.
Operations capability : identifying and solving business pain points, improving experience.
Cost control : evaluating labor, hardware, bandwidth, software to reduce expenses.
Operations development : building automation tools and cultivating development skills.
Different enterprises require varying depths of these capabilities, often leading to complex technology stacks.
Automation claims are widespread, yet many financial institutions still face limited coverage and high manual effort.
1.2 Operations Pain Points
Rapid development of operations technologies has raised expectations, but organizations still encounter challenges such as limited automation, high pressure, and reliance on expert knowledge.
Organizational Pain
External factors include expanding business scale, intense competition, stricter regulation, and the shift to multi‑center, cloud‑native data centers.
Internal factors involve difficulty quantifying skill levels, lack of standardized processes, and the burden of repetitive tasks.
Self‑Help Strategies
SRE
Site Reliability Engineering (SRE) combines software development and operations to ensure site availability, requiring deep system knowledge, process expertise, and development of automation tools.
DevOps
DevOps bridges development and operations through automation, standardization, and continuous delivery, aiming to improve speed, quality, and agility.
AIOps
AIOps applies algorithms and machine learning to IT operations data, enhancing automation by analyzing logs, metrics, and events to generate insights, automate responses, and support continuous improvement.
1.4 Sustainable Operations System
Continuous Improvement (PDCA)
The PDCA cycle—Plan, Do, Check, Act—guides the evolution of operations by aligning with business goals, standardizing processes, and iteratively refining tools.
Transformation Path
Shift from reactive firefighting to proactive, value‑driven operations.
Move from manual, expert‑centric work to operations development platforms.
Adopt intelligent, data‑driven practices using AIOps.
Building the System
Three pillars—organization, process, tool—must be integrated:
Organization : professional, refined, and operationally focused teams.
Process : standardized, visualized, and measurable workflows based on ITIL, ISO20000, ITSS.1, and DevOps.
Tool : automation, digitization, intelligence, and service‑orientation to support monitoring, control, and analytics.
Source: Article originally published on the “Operations Path” public account, author Peng Huasheng.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.