Tagged articles

Service Avalanche

9 articles · Page 1 of 1

Mar 10, 2026 · Operations

How to Master Service Avalanche Recovery: A Complete SRE Playbook from Alert to Restoration

This guide walks SRE and senior operations engineers through a real-world service‑avalanche incident, detailing alert hierarchy design, fault‑location commands, emergency SOPs, capacity‑baseline building, and post‑mortem best practices to dramatically reduce MTTR in distributed micro‑service environments.

PrometheusSREService Avalanche

0 likes · 19 min read

How to Master Service Avalanche Recovery: A Complete SRE Playbook from Alert to Restoration

MaGe Linux Operations

Oct 16, 2025 · Operations

SRE Playbook: From Alert to Full Recovery of Service Avalanches

This comprehensive SRE guide walks through a real-world service avalanche incident, detailing alert triggering, root‑cause analysis, step‑by‑step recovery, capacity baseline creation, layered alert design, automated scripts, and post‑mortem best practices to help engineers prevent and resolve large‑scale outages.

AlertingSREService Avalanche

0 likes · 20 min read

SRE Playbook: From Alert to Full Recovery of Service Avalanches

Efficient Ops

Oct 23, 2023 · Operations

Why Redis Failed: Jedis Misconfigurations That Spark Service Avalanches

This article examines a Redis 3.x cluster failure caused by a master‑slave switch, detailing how improper Jedis timeout and retry settings triggered a service avalanche, and provides step‑by‑step analysis of the incident, code paths, and recommended configuration adjustments to prevent recurrence.

JedisRedisService Avalanche

0 likes · 12 min read

Why Redis Failed: Jedis Misconfigurations That Spark Service Avalanches

Sohu Tech Products

Aug 23, 2023 · Backend Development

Analysis of Service Avalanche Caused by Jedis Parameter Misconfiguration During Redis Cluster Failover

During a Redis 3.x cluster master‑slave failover, the default Jedis connection timeout of two seconds combined with six automatic retries caused each request’s Redis calls to accumulate up to sixty seconds of latency, triggering Nginx timeouts and a service‑avalanche, which was resolved by lowering timeout and retry settings.

Backend DevelopmentCluster FailoverConnection Retry

0 likes · 13 min read

Analysis of Service Avalanche Caused by Jedis Parameter Misconfiguration During Redis Cluster Failover

vivo Internet Technology

Jul 19, 2023 · Databases

Analysis of Service Avalanche Caused by Misconfigured Jedis Parameters During Redis Cluster Master‑Slave Switch

A service‑wide avalanche occurred when a Redis 3.x master‑slave failover coincided with Jedis’ default 2‑second connection timeout and six retry attempts, causing up to 60‑second latencies; adjusting connectionTimeout, soTimeout to 100 ms and reducing maxAttempts to two limited latency to about one second and prevented cascade failures.

Connection RetryJedisPerformance

0 likes · 13 min read

Analysis of Service Avalanche Caused by Misconfigured Jedis Parameters During Redis Cluster Master‑Slave Switch

Wukong Talks Architecture

Dec 15, 2021 · Operations

Understanding Service Avalanche and Circuit Breaker Mechanisms through the Red Cliffs Battle Analogy

This article uses the historic Battle of Red Cliffs as an analogy to explain service avalanche, its causes in micro‑service architectures, and how circuit‑breaker, rate‑limiting, and isolation techniques can prevent cascading failures in modern distributed systems.

MicroservicesOperationsService Avalanche

0 likes · 14 min read

Understanding Service Avalanche and Circuit Breaker Mechanisms through the Red Cliffs Battle Analogy

Tencent Database Technology

Oct 13, 2021 · Databases

Tencent Cloud MongoDB Enhances maxTimeMS Handling to Avoid Service Avalanche

This article explains how Tencent Cloud MongoDB improves the maxTimeMS server-side timeout feature to prevent request backlog and service avalanche, covering native MongoDB limitations, optimizations in mongos write command support, and default configuration implementation.

Backend DevelopmentMongoDBService Avalanche

0 likes · 10 min read

Tencent Cloud MongoDB Enhances maxTimeMS Handling to Avoid Service Avalanche

macrozheng

Nov 12, 2020 · Operations

Red Cliffs Battle: Lessons on Service Avalanche and Circuit Breakers

Using the historic Red Cliffs battle as a metaphor, this article explains how linked services can cause a cascading failure—service avalanche—in microservice architectures, and details prevention techniques such as rate limiting, isolation, and especially circuit breaker mechanisms with their principles and recovery algorithms.

Service Avalanchecircuit breakersystem reliability

0 likes · 13 min read

Red Cliffs Battle: Lessons on Service Avalanche and Circuit Breakers

Wukong Talks Architecture

Oct 28, 2020 · Operations

From the Battle of Red Cliffs to Service Avalanche: Understanding Circuit Breaker and Resilience in Microservices

This article uses the historic Battle of Red Cliffs as an analogy to explain service avalanche in micro‑service architectures, analyzes its causes, presents real‑world scenarios, and details circuit‑breaker concepts, algorithms, recovery strategies, and practical mitigation techniques.

ResilienceService Avalanchecircuit breaker

0 likes · 10 min read

From the Battle of Red Cliffs to Service Avalanche: Understanding Circuit Breaker and Resilience in Microservices