Operations 30 min read

Building Scalable Operations: From SRE to AIOps and DevOps

This article explores how to construct a scalable operations framework by integrating concepts such as SRE, DevOps, AIOps, and continuous improvement, addressing organizational challenges, process standardization, tool automation, and the shift from reactive firefighting to proactive, value‑driven management.

Efficient Ops
Efficient Ops
Efficient Ops
Building Scalable Operations: From SRE to AIOps and DevOps

Preface

After a long hiatus, the author reflects on the original purpose of the public account: to consolidate scattered work knowledge into a coherent operations knowledge system.

The goal is to build a sustainable, extensible operations framework that integrates organization, processes, and tools.

1.1 Operations Is Not Simple

Operations is often misunderstood as merely deploying programs or handling simple tasks, but it actually requires comprehensive technical and management capabilities, encompassing methods, tools, processes, and documentation.

Key capability areas include:

Implementation of operational standards : applying ITIL, ISO20000, ITSS.1, etc.

Regulatory compliance : understanding and responding to regulator requirements.

Basic assurance : configuration, monitoring, release, scaling, incident and problem management.

Fundamental skills : networking, servers, OS, databases, middleware, JVM, application tuning.

Business service capability : SLA, service desk, knowledge base, support.

Availability management : inspections, high‑availability design, redundancy.

Risk and security management : operation audit, vulnerability and attack control.

Incident management : event and problem handling.

Continuous delivery : application changes and infrastructure delivery.

Proactive optimization : architecture, performance, user experience.

Emergency drills : high‑availability architecture, failure scenarios, documentation, personnel readiness.

Business support : data maintenance, extraction, parameter management.

Operational analysis : capacity, performance, availability analysis.

Operations capability : identifying and solving business pain points, improving experience.

Cost control : evaluating labor, hardware, bandwidth, software to reduce expenses.

Operations development : building automation tools and cultivating development skills.

Different enterprises require varying depths of these capabilities, often leading to complex technology stacks.

Automation claims are widespread, yet many financial institutions still face limited coverage and high manual effort.

1.2 Operations Pain Points

Rapid development of operations technologies has raised expectations, but organizations still encounter challenges such as limited automation, high pressure, and reliance on expert knowledge.

Organizational Pain

External factors include expanding business scale, intense competition, stricter regulation, and the shift to multi‑center, cloud‑native data centers.

Internal factors involve difficulty quantifying skill levels, lack of standardized processes, and the burden of repetitive tasks.

Self‑Help Strategies

SRE

Site Reliability Engineering (SRE) combines software development and operations to ensure site availability, requiring deep system knowledge, process expertise, and development of automation tools.

DevOps

DevOps bridges development and operations through automation, standardization, and continuous delivery, aiming to improve speed, quality, and agility.

AIOps

AIOps applies algorithms and machine learning to IT operations data, enhancing automation by analyzing logs, metrics, and events to generate insights, automate responses, and support continuous improvement.

1.4 Sustainable Operations System

Continuous Improvement (PDCA)

The PDCA cycle—Plan, Do, Check, Act—guides the evolution of operations by aligning with business goals, standardizing processes, and iteratively refining tools.

Transformation Path

Shift from reactive firefighting to proactive, value‑driven operations.

Move from manual, expert‑centric work to operations development platforms.

Adopt intelligent, data‑driven practices using AIOps.

Building the System

Three pillars—organization, process, tool—must be integrated:

Organization : professional, refined, and operationally focused teams.

Process : standardized, visualized, and measurable workflows based on ITIL, ISO20000, ITSS.1, and DevOps.

Tool : automation, digitization, intelligence, and service‑orientation to support monitoring, control, and analytics.

Source: Article originally published on the “Operations Path” public account, author Peng Huasheng.
AutomationOperationsscalabilityDevOpsSREAIOpsIT Management
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.