Operations 7 min read

Is 24/7 On‑Call a Nightmare? Real Ops Insights from Zhihu Discussions

This article compiles diverse Zhihu comments on the reality of 24 × 7 on‑call duties, contrasting exaggerated myths with practical team‑based solutions, global shift models, backup strategies, and actionable tips for improving operations without sacrificing personal life.

Efficient Ops
Efficient Ops
Efficient Ops
Is 24/7 On‑Call a Nightmare? Real Ops Insights from Zhihu Discussions

Recent anti‑overwork measures by companies like DJI, Haier, and Midea have sparked questions about whether operations (运维) truly require a terrifying 24 × 7 on‑call schedule.

Microsoft’s Global Shift Model

Microsoft solves the 7 × 24 challenge by deploying three teams in different time zones—West 8 (Seattle), UTC (UK/Ireland), and East 8 (Shanghai). Each team works an eight‑hour shift, handing off responsibilities to the next region, ensuring continuous coverage without any individual staying on duty all night.

Typical Small‑Company Pain Points

In many smaller firms, managers may message at 3 a.m. demanding status updates, criticize work attitude, or even threaten dismissal, creating a perception that on‑call is terrifying.

Operations Is a Team Effort

Ops is a profession and a team effort, not a solo burden. Large companies have extensive ops teams with rotating shifts, so a single person’s workload remains manageable. Global teams often operate on a “early‑nine‑to‑five” schedule within each time zone, with occasional weekend hand‑overs.

Practical On‑Call Practices

Backup operators can cover each other.

Rotating duty schedules give each engineer at least one week per month without on‑call.

Service level agreements (SLA) and disaster‑recovery mechanisms ensure that a few failed nodes do not cause major incidents.

Team culture emphasizes planned work over constant firefighting.

On‑call is typically triggered by two scenarios: urgent business requests or catastrophic alerts. In most months, total on‑call time rarely exceeds three hours.

Small‑Company Example

A small firm with ~100 Alibaba Cloud servers uses high‑availability setups, automation scripts, and comprehensive monitoring, resulting in very few incidents and normal weekend rest.

Improving Operations

Technical improvements rely on mature open‑source solutions that address up to 80 % of common problems—high‑availability architectures, caching, front‑back separation, middle‑platform API governance, etc. Management improvements include standby readiness, clear escalation paths, and well‑documented hand‑over procedures.

Typical On‑Call Schedule

Standard work hours are 9 am–6 pm, with a fixed on‑call rotation covering evenings and weekends. On‑call engineers must keep phones on standby 24 hours and are compensated with time off after handling incidents.

automationoperationsSREincident managementteamworkon-call
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.