How Envoy’s Circuit Breakers and Outlier Detection Stop Service Avalanches
This article explains how Envoy’s circuit‑breaker and outlier‑detection features protect micro‑service architectures from avalanche failures by limiting concurrent connections, ejecting unhealthy instances, and provides configuration examples, testing methods, and best‑practice tips for building resilient cloud‑native systems.
Why Avalanche Effects Threaten Microservices
In a micro‑service architecture the biggest risk is not a single service failure but an avalanche caused by a slow interface or a database timeout that brings down the whole system.
Envoy’s Two Defense Mechanisms
As the data plane of Istio and service mesh, Envoy offers two key features to prevent such collapses:
Circuit Breaking
Outlier Detection
Circuit Breaking
Purpose: limit concurrent access to a service and protect upstream connection pools and thread pools from being exhausted.
circuit_breakers:
thresholds:
- priority: "DEFAULT"
max_connections: 10 # max concurrent connections per upstream host (default 1024)
max_pending_requests: 10 # max queue length (default 1024)
max_requests: 10 # max requests per connection before rotation (default 1024)
max_retries: 3 # max retry attemptsTesting with go‑stress‑testing shows that when the concurrency reaches the configured limits, Envoy reports upstream_cx_overflow and the circuit‑breaker parameters take effect.
Outlier Detection
Purpose: dynamically identify and isolate backend instances that behave abnormally (slow responses, 5xx errors) to improve overall service quality.
When a host is marked as an outlier, Envoy ejects it for a period defined by outlier_detection.base_ejection_time_ms multiplied by the number of ejections; the host is excluded from load‑balancing unless the system enters panic mode.
# Simulate fault
curl -X POST -d '{"health": false}' 172.139.20.170:8090/livez
# Observe ejection counters
cluster.simple_cluster.outlier_detection.ejections_consecutive_5xx: 2
cluster.simple_cluster.outlier_detection.ejections_enforced_consecutive_5xx: 2Full Envoy Configuration Example
admin:
access_log_path: /tmp/admin_access.log
address:
socket_address:
address: 0.0.0.0
port_value: 9901
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
access_log:
- name: envoy.access_loggers.stdout
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: simple_cluster
clusters:
- name: simple_cluster
lb_policy: ROUND_ROBIN
type: STATIC
load_assignment:
cluster_name: simple_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address: { address: 172.139.20.170, port_value: 8090 }
- endpoint:
address:
socket_address: { address: 172.139.20.3, port_value: 8090 }
- endpoint:
address:
socket_address: { address: 172.139.20.92, port_value: 8090 }
circuit_breakers:
thresholds:
- priority: "DEFAULT"
max_connections: 10
max_pending_requests: 10
max_requests: 10
max_retries: 3
outlier_detection:
interval: 10s
base_ejection_time: 30s
consecutive_5xx: 5
max_ejection_percent: 50Conclusion
In high‑concurrency, highly‑available cloud‑native systems, defensive design is essential. Envoy’s circuit breakers act like a fuse to stop upstream resource exhaustion, while outlier detection works like a doctor to isolate unhealthy backend instances, greatly improving resilience and reducing recovery time.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Linux Ops Smart Journey
The operations journey never stops—pursuing excellence endlessly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
