Tagged articles
14 articles
Page 1 of 1
Raymond Ops
Raymond Ops
Apr 20, 2026 · Operations

How to Build a Standardized SRE On‑Call Process: From Alert Grading to Handoff Templates

This article presents a complete SRE on‑call handbook that defines alert severity levels, provides concrete Prometheus Alertmanager configurations, outlines a step‑by‑step response flow, details war‑room roles, escalation paths, handoff checklists, post‑mortem procedures, and dozens of ready‑to‑use templates to reduce MTTR and improve reliability.

Alert ManagementOn-CallOperations
0 likes · 27 min read
How to Build a Standardized SRE On‑Call Process: From Alert Grading to Handoff Templates
MaGe Linux Operations
MaGe Linux Operations
Dec 10, 2025 · Operations

Standardized SRE On‑Call Handbook: Alert Grading, Response Flow, and Handoff Templates

This handbook presents a complete, two‑year‑tested SRE on‑call process that defines alert severity tiers, response requirements, escalation paths, War‑Room roles, handoff schedules, post‑mortem procedures, and provides ready‑to‑use configuration snippets, checklists and templates to reduce MTTR and repeat incidents.

Alert ManagementOn-CallOperations
0 likes · 26 min read
Standardized SRE On‑Call Handbook: Alert Grading, Response Flow, and Handoff Templates
DevOps Coach
DevOps Coach
Oct 10, 2025 · Interview Experience

How I Fast‑Tracked My Software Engineer Career: 10 Practical Growth Hacks

This article shares a software engineer’s eight‑year journey, detailing concrete habits like weekly work logs, on‑call participation, cautious tech adoption, internal team rotation, writing, and interview preparation, offering actionable advice for junior and mid‑level developers seeking rapid career advancement.

Career DevelopmentInterview PreparationOn-Call
0 likes · 24 min read
How I Fast‑Tracked My Software Engineer Career: 10 Practical Growth Hacks
Efficient Ops
Efficient Ops
Mar 18, 2025 · Operations

Is 24/7 On‑Call a Nightmare? Real Ops Insights from Zhihu Discussions

This article compiles diverse Zhihu comments on the reality of 24 × 7 on‑call duties, contrasting exaggerated myths with practical team‑based solutions, global shift models, backup strategies, and actionable tips for improving operations without sacrificing personal life.

On-CallSREteamwork
0 likes · 7 min read
Is 24/7 On‑Call a Nightmare? Real Ops Insights from Zhihu Discussions
Efficient Ops
Efficient Ops
May 31, 2023 · Operations

How Tencent Scales SRE: Building a SLO‑Based Quality Operations System

This article examines Tencent's end‑to‑end SRE quality‑operation framework built on Service Level Objectives (SLO) and On‑Call, detailing industry background, problem statements, SLO management, On‑Call benefits, product architecture, large‑scale deployment, and future plans for reliability engineering.

On-CallQuality OperationsSLO
0 likes · 11 min read
How Tencent Scales SRE: Building a SLO‑Based Quality Operations System
Efficient Ops
Efficient Ops
Feb 5, 2020 · Operations

Balancing Stability and Speed: Google SRE Lessons for Modern Ops Teams

This article examines the inherent tension between operations and development, explains Google’s error‑budget and SLO approach, and shares practical DevOps, on‑call, automation, and talent strategies that help ops teams improve efficiency while maintaining product reliability.

AutomationError BudgetOn-Call
0 likes · 9 min read
Balancing Stability and Speed: Google SRE Lessons for Modern Ops Teams
Sohu Tech Products
Sohu Tech Products
Oct 23, 2019 · Operations

Google SRE Weekly Alert Limits and Practical Strategies for Reducing Alert Fatigue

This article examines how Google SRE limits weekly alerts to ten, compares it with typical Chinese internet operations teams, and provides practical strategies—including on‑call scheduling, alert escalation, automation, dashboard optimization, and team management—to dramatically reduce alert volume and improve incident response.

Alert ManagementOn-CallOperations
0 likes · 15 min read
Google SRE Weekly Alert Limits and Practical Strategies for Reducing Alert Fatigue
dbaplus Community
dbaplus Community
Oct 16, 2019 · Operations

How to Cut Alert Noise: Practical SRE Strategies for Ops Teams

This article shares concrete SRE‑inspired techniques—duty‑roster scheduling, tiered alert handling, automation safeguards, dashboard focus on top‑3 alerts, time‑based filtering, and systematic code review—to dramatically reduce daily alarm volume while keeping on‑call teams motivated and effective.

On-CallSREalert optimization
0 likes · 15 min read
How to Cut Alert Noise: Practical SRE Strategies for Ops Teams
Efficient Ops
Efficient Ops
Nov 7, 2016 · Operations

How to Train New SREs Effectively: Proven Practices and Playbooks

This article outlines a systematic approach to onboarding and training new Site Reliability Engineers, covering trust building, readiness assessment, diverse learning methods, structured curricula, on‑call milestones, project‑focused work, reverse‑engineering skills, statistical thinking, and improvisation techniques to develop high‑performing SRE teams.

On-CallOperationsSRE
0 likes · 17 min read
How to Train New SREs Effectively: Proven Practices and Playbooks