Mastering System Degradation: Keep Your Services Highly Available
This guide explains why degradation is a vital protection mechanism, outlines five strategies across automation, functional, and system‑level dimensions, and details practical implementations such as automatic and manual switches, read/write service fallback, and multi‑level degradation to maintain core functionality under heavy load.
Degradation is a crucial system protection measure that ensures high availability; simply put, it means "throwing away the horse to save the general"—temporarily skipping non‑essential actions under extreme load to keep core functions running.
In e‑commerce, core features like the shopping cart and checkout must never be degraded, while non‑essential services such as personalized product recommendations can be temporarily disabled.
Degradation strategies can be categorized along three dimensions into five approaches:
Automation dimension: automatic switch degradation and manual switch degradation.
Functional dimension: read‑service degradation and write‑service degradation.
System‑level dimension: multi‑level degradation.
1. Automatic Switch Degradation
The system automatically triggers degradation based on runtime conditions, such as:
Timeouts: when a remote non‑core service responds too slowly, stop calling it after configuring appropriate timeout and retry limits.
Failure counts: if an external service (e.g., ticketing) exceeds a failure tolerance, automatically degrade and use an asynchronous thread to monitor recovery.
Faults: if a remote service crashes, fall back to default values, pre‑prepared content, or cached data.
Rate limiting: in flash‑sale scenarios, once the limit is reached, redirect users to a queue page or inform them of out‑of‑stock status.
2. Manual Switch Degradation
Sometimes you want to degrade services before problems appear, such as disabling recommendation engines ahead of a promotion or rolling back a new feature during gray testing. This requires manually controllable switches stored in configuration files, databases, Redis, Zookeeper, etc., and synchronized periodically.
Distributed systems often use a centralized configuration center with a web UI for easy management; open‑source options include ZooKeeper, Diamond, Etcd 3, and Consul.
3. Read‑Service Degradation
From a data‑reading perspective, non‑core information on pages (e.g., merchant info, recommendations, delivery details) can be degraded when exceptions occur. For example, before a promotion, the entire product detail page can be served as a static page to maximize read‑service degradation.
4. Write‑Service Degradation
Write services are critical; the typical degradation approach is to convert synchronous writes to asynchronous writes.
Inventory deduction example:
Option 1: deduct in the database, then update Redis cache.
Option 2: deduct from Redis first, then synchronously deduct from the database; if the database update fails, roll back the Redis change.
When database performance cannot keep up, switch to asynchronous mode: deduct from Redis, send a message to a queue for asynchronous database deduction, achieving eventual consistency.
Similarly, high‑volume user reviews can be written asynchronously, and reward processing can also be deferred.
5. Multi‑Level Degradation
Based on the distance to the user, degradation can be applied at three layers:
Page‑JS degradation switch: controls feature toggles via JavaScript on the client side.
Access‑layer degradation switch: placed at the request entry point (e.g., Nginx) to perform automatic or manual degradation, supporting second‑level switching, fine‑grained service toggles, and timeout‑based auto‑degradation.
Application‑layer degradation switch: configured within the application to enable automatic or manual degradation of specific functionalities.
Content compiled from "Core Technologies of Billion‑Scale Traffic Site Architecture".
Click below to read the original article and explore the full list of posts.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
