Cloud Native 11 min read

How Envoy’s Circuit Breakers and Outlier Detection Stop Service Avalanches

This article explains how Envoy’s circuit‑breaker and outlier‑detection features protect micro‑service architectures from avalanche failures by limiting concurrent connections, ejecting unhealthy instances, and provides configuration examples, testing methods, and best‑practice tips for building resilient cloud‑native systems.

Linux Ops Smart Journey
Linux Ops Smart Journey
Linux Ops Smart Journey
How Envoy’s Circuit Breakers and Outlier Detection Stop Service Avalanches

Why Avalanche Effects Threaten Microservices

In a micro‑service architecture the biggest risk is not a single service failure but an avalanche caused by a slow interface or a database timeout that brings down the whole system.

Envoy’s Two Defense Mechanisms

As the data plane of Istio and service mesh, Envoy offers two key features to prevent such collapses:

Circuit Breaking

Outlier Detection

Circuit Breaking

Purpose: limit concurrent access to a service and protect upstream connection pools and thread pools from being exhausted.

circuit_breakers:
  thresholds:
  - priority: "DEFAULT"
    max_connections: 10   # max concurrent connections per upstream host (default 1024)
    max_pending_requests: 10   # max queue length (default 1024)
    max_requests: 10   # max requests per connection before rotation (default 1024)
    max_retries: 3   # max retry attempts

Testing with go‑stress‑testing shows that when the concurrency reaches the configured limits, Envoy reports upstream_cx_overflow and the circuit‑breaker parameters take effect.

Outlier Detection

Purpose: dynamically identify and isolate backend instances that behave abnormally (slow responses, 5xx errors) to improve overall service quality.

When a host is marked as an outlier, Envoy ejects it for a period defined by outlier_detection.base_ejection_time_ms multiplied by the number of ejections; the host is excluded from load‑balancing unless the system enters panic mode.

# Simulate fault
curl -X POST -d '{"health": false}' 172.139.20.170:8090/livez

# Observe ejection counters
cluster.simple_cluster.outlier_detection.ejections_consecutive_5xx: 2
cluster.simple_cluster.outlier_detection.ejections_enforced_consecutive_5xx: 2

Full Envoy Configuration Example

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          access_log:
          - name: envoy.access_loggers.stdout
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: simple_cluster

  clusters:
  - name: simple_cluster
    lb_policy: ROUND_ROBIN
    type: STATIC
    load_assignment:
      cluster_name: simple_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address: { address: 172.139.20.170, port_value: 8090 }
        - endpoint:
            address:
              socket_address: { address: 172.139.20.3, port_value: 8090 }
        - endpoint:
            address:
              socket_address: { address: 172.139.20.92, port_value: 8090 }
    circuit_breakers:
      thresholds:
      - priority: "DEFAULT"
        max_connections: 10
        max_pending_requests: 10
        max_requests: 10
        max_retries: 3
    outlier_detection:
      interval: 10s
      base_ejection_time: 30s
      consecutive_5xx: 5
      max_ejection_percent: 50

Conclusion

In high‑concurrency, highly‑available cloud‑native systems, defensive design is essential. Envoy’s circuit breakers act like a fuse to stop upstream resource exhaustion, while outlier detection works like a doctor to isolate unhealthy backend instances, greatly improving resilience and reducing recovery time.

Envoy circuit breaking and outlier detection diagram
Envoy circuit breaking and outlier detection diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud nativemicroservicesservice meshEnvoycircuit breakingoutlier detection
Linux Ops Smart Journey
Written by

Linux Ops Smart Journey

The operations journey never stops—pursuing excellence endlessly.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.