Operations 4 min read

How to Classify and Prioritize Online Incidents for Better System Stability

Effective incident management begins with clear classification; this guide explains how technical leaders can categorize online failures by nature, severity, and source—distinguishing usability versus financial loss incidents, ranking P0‑P3 levels, and identifying external, operational, product, and system‑quality fault types—to improve stability and learning.

Xiaohe Frontend Team
Xiaohe Frontend Team
Xiaohe Frontend Team
How to Classify and Prioritize Online Incidents for Better System Stability

Introduction

Online quality is the lifeline of product and business teams, enabling rapid experimentation and small‑step iteration. Modern systems are extremely complex, spanning diverse software and hardware environments, so bugs are inevitable. As a developer you spend every day either writing bugs or on the road to fixing them.

Online incidents are countless, but they follow patterns.

Classify incidents by nature: usability vs. financial loss

Usability incident: caused by technical reasons that make part or all of a system’s functionality unavailable, preventing normal business flow or service delivery (e.g., login failures, API bugs, missing list items, crashes). These stem from design, implementation, or insufficient safeguards.

Financial‑loss incident: the system functions correctly, but logical or calculation errors cause monetary loss for the business (e.g., wrong pricing, mis‑issued coupons, discount errors, security loopholes). Such incidents have larger impact and higher mitigation cost.

Classify by severity

P0 (critical): system completely unavailable or causes major financial loss.

P1 (severe): core functionality lost or major secondary functionality lost, with financial impact.

P2 (moderate): secondary functionality lost or minor financial loss.

P3 (minor): UI display or prompt issues that do not affect important user functions.

Classify by fault type

Faults can be divided into external‑dependency, operational, product‑requirement, and system‑quality failures.

External‑dependency failure: upstream or downstream services malfunction, blocking business processes (e.g., third‑party SDKs, unavailable client‑side APIs).

Operational failure: configuration errors leading to system outages (e.g., missing or incorrect marketing configuration fields).

Product‑requirement failure: defects in product design that make features internally inconsistent, causing online issues.

System‑quality failure: crashes, ANRs, response timeouts, etc.

operationssystem stabilityfault classification
Xiaohe Frontend Team
Written by

Xiaohe Frontend Team

Xiaohe Frontend Team

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.