Tagged articles
9 articles
Page 1 of 1
Efficient Ops
Efficient Ops
Jun 9, 2025 · Operations

How OnCall Platforms Transform Incident Management and Reduce Manual Overhead

This article explains the purpose and key features of OnCall platforms, compares popular solutions like PagerDuty, Opsgenie, Grafana OnCall and Alibaba Cloud ARMS, clarifies webhooks with a simple analogy, and summarizes how centralized on‑call management boosts operational efficiency while minimizing manual intervention.

Oncallincident responsewebhook
0 likes · 5 min read
How OnCall Platforms Transform Incident Management and Reduce Manual Overhead
Efficient Ops
Efficient Ops
Apr 8, 2024 · Operations

What Exactly Is SRE? A Deep Dive into Roles, Responsibilities, and Best Practices

This article explains what Site Reliability Engineering (SRE) is, outlines the three main layers of SRE work—Infrastructure, Platform, and Business—covers hiring challenges, daily duties such as deployment, on‑call, SLI/SLO management, capacity planning, user support, and offers practical interview and career advice.

OncallOperationsSRE
0 likes · 22 min read
What Exactly Is SRE? A Deep Dive into Roles, Responsibilities, and Best Practices
DevOps
DevOps
Jul 25, 2022 · Operations

Understanding the Role and Responsibilities of Site Reliability Engineering (SRE)

This article provides a comprehensive overview of Site Reliability Engineering, explaining its origins, core responsibilities across infrastructure, platform, and business layers, daily tasks such as deployment, on‑call duties, SLI/SLO management, incident post‑mortems, capacity planning, and user support, as well as career advice for aspiring SREs.

InfrastructureOncallReliability
0 likes · 21 min read
Understanding the Role and Responsibilities of Site Reliability Engineering (SRE)
Bilibili Tech
Bilibili Tech
Apr 26, 2022 · Operations

Bilibili's SRE Practice for Business Stability: Theory, Metrics, and Operational Implementation

Bilibili’s SRE team combines stability theory, detailed fault‑stage and operational metrics, and a unified emergency‑response platform—including on‑call scheduling, fault‑command incident commanders, automated fault portraits, and rapid post‑mortems—to transform frequent incidents into data‑driven, collaborative recoveries and lay groundwork for AI‑assisted self‑healing.

Business StabilityOncallOperations
0 likes · 23 min read
Bilibili's SRE Practice for Business Stability: Theory, Metrics, and Operational Implementation
IT Architects Alliance
IT Architects Alliance
Apr 17, 2022 · Operations

Understanding the SRE Role: Responsibilities, Types, and Practices

This article explains what Site Reliability Engineering (SRE) is, why it was created, the challenges in hiring SREs, and breaks the role into three layers—Infrastructure, Platform, and Business—detailing their duties, deployment processes, on‑call practices, SLI/SLO management, incident post‑mortems, capacity planning, user support, and career advice.

InfrastructureOncallOperations
0 likes · 21 min read
Understanding the SRE Role: Responsibilities, Types, and Practices
Architect
Architect
Apr 16, 2022 · Operations

A Comprehensive Overview of Site Reliability Engineering (SRE) Roles and Practices

This article explains what SRE is, why it was created, how its responsibilities differ across companies, and breaks the work into Infrastructure, Platform, and Business SRE while covering deployment, on‑call, SLI/SLO, incident post‑mortems, capacity planning, user support, and career advice.

OncallOperationsSLI/SLO
0 likes · 22 min read
A Comprehensive Overview of Site Reliability Engineering (SRE) Roles and Practices
IT Architects Alliance
IT Architects Alliance
Dec 1, 2021 · Operations

What Does an SRE Actually Do? A Deep Dive into Roles and Practices

This article explains the origins of Site Reliability Engineering, breaks down its three main layers—Infrastructure, Platform, and Business SRE—covers day‑one and day‑2 deployment, on‑call processes, SLI/SLO design, post‑mortems, capacity planning, user support, and offers practical advice for aspiring SREs.

InfrastructureOncallOperations
0 likes · 24 min read
What Does an SRE Actually Do? A Deep Dive into Roles and Practices
Programmer DD
Programmer DD
Nov 16, 2021 · Operations

What Does an SRE Do? A Practical Guide to Site Reliability Engineering

This article explains the role of Site Reliability Engineering (SRE), its origins at Google, the challenges of hiring, the three-layer model of infrastructure, platform, and business SRE, and provides detailed responsibilities, on‑call practices, SLI/SLO management, capacity planning, and career advice for aspiring SREs.

InfrastructureOncallSLI
0 likes · 23 min read
What Does an SRE Do? A Practical Guide to Site Reliability Engineering