Is 24/7 On‑Call a Nightmare? Real Ops Insights from Zhihu Discussions
This article compiles diverse Zhihu comments on the reality of 24 × 7 on‑call duties, contrasting exaggerated myths with practical team‑based solutions, global shift models, backup strategies, and actionable tips for improving operations without sacrificing personal life.
Recent anti‑overwork measures by companies like DJI, Haier, and Midea have sparked questions about whether operations (运维) truly require a terrifying 24 × 7 on‑call schedule.
Microsoft’s Global Shift Model
Microsoft solves the 7 × 24 challenge by deploying three teams in different time zones—West 8 (Seattle), UTC (UK/Ireland), and East 8 (Shanghai). Each team works an eight‑hour shift, handing off responsibilities to the next region, ensuring continuous coverage without any individual staying on duty all night.
Typical Small‑Company Pain Points
In many smaller firms, managers may message at 3 a.m. demanding status updates, criticize work attitude, or even threaten dismissal, creating a perception that on‑call is terrifying.
Operations Is a Team Effort
Ops is a profession and a team effort, not a solo burden. Large companies have extensive ops teams with rotating shifts, so a single person’s workload remains manageable. Global teams often operate on a “early‑nine‑to‑five” schedule within each time zone, with occasional weekend hand‑overs.
Practical On‑Call Practices
Backup operators can cover each other.
Rotating duty schedules give each engineer at least one week per month without on‑call.
Service level agreements (SLA) and disaster‑recovery mechanisms ensure that a few failed nodes do not cause major incidents.
Team culture emphasizes planned work over constant firefighting.
On‑call is typically triggered by two scenarios: urgent business requests or catastrophic alerts. In most months, total on‑call time rarely exceeds three hours.
Small‑Company Example
A small firm with ~100 Alibaba Cloud servers uses high‑availability setups, automation scripts, and comprehensive monitoring, resulting in very few incidents and normal weekend rest.
Improving Operations
Technical improvements rely on mature open‑source solutions that address up to 80 % of common problems—high‑availability architectures, caching, front‑back separation, middle‑platform API governance, etc. Management improvements include standby readiness, clear escalation paths, and well‑documented hand‑over procedures.
Typical On‑Call Schedule
Standard work hours are 9 am–6 pm, with a fixed on‑call rotation covering evenings and weekends. On‑call engineers must keep phones on standby 24 hours and are compensated with time off after handling incidents.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.