Cloud Native 9 min read

Mastering Envoy: Service Discovery, Load Balancing, and Health Checks Explained

Envoy, the high‑performance edge and service‑mesh proxy, offers three core mechanisms—service discovery, load balancing, and health checking—each with multiple configurable options and code examples, enabling operators and developers to optimize distributed systems for scalability, reliability, and performance in cloud‑native environments.

Linux Ops Smart Journey
Linux Ops Smart Journey
Linux Ops Smart Journey
Mastering Envoy: Service Discovery, Load Balancing, and Health Checks Explained

In the era of cloud‑native architecture, Envoy has become a high‑performance, extensible edge and service‑to‑service proxy that serves as a core component of service meshes such as Istio.

Three Core Mechanisms

Envoy provides Service Discovery, Load Balancing, and Health Checking/Outlier Detection to help operators and developers understand its operation and optimize online systems.

Service Discovery Mechanism

When an upstream cluster is defined, Envoy must resolve the members of the cluster. This is the service discovery process.

Static : The simplest type; explicitly specify network names (IP/port, Unix socket, etc.) for each upstream host.

Strict DNS/Logical DNS : Envoy periodically resolves DNS records (A/AAAA) to obtain a list of backend instance IPs.

Original destination : Uses the original destination cluster when inbound connections are redirected to Envoy via iptables REDIRECT, TPROXY, or proxy protocol.

Endpoint Discovery Service (EDS) : An xDS management server based on gRPC or REST‑JSON API that Envoy queries to obtain cluster members.

# Static
clusters:
- name: static_cluster
  type: STATIC
  load_assignment:
    cluster_name: static_cluster
    endpoints:
    - lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 10.0.0.1
              port_value: 8080

# Logical DNS
clusters:
- name: logical_dns_cluster
  type: LOGICAL_DNS
  load_assignment:
    cluster_name: logical_dns_cluster
    endpoints:
    - lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: my-service.default.svc.cluster.local
              port_value: 80

# EDS (Endpoint Discovery Service)
clusters:
- name: eds_cluster
  type: EDS
  eds_cluster_config:
    eds_config:
      api_config_source:
        api_type: GRPC
        grpc_services:
        - envoy_grpc:
            cluster_name: xds_cluster

Load Balancing Mechanism

Load balancing is the foundation of high availability and high performance in distributed systems.

ROUND_ROBIN : Distributes requests in a round‑robin fashion; the default algorithm.

LEAST_REQUEST : Prefers the instance with the fewest active requests.

RING_HASH : Consistent hashing based on request attributes (e.g., header or cookie).

MAGLEV : High‑performance consistent hashing with more uniform distribution.

RANDOM : Randomly assigns requests; simple and efficient.

# ROUND_ROBIN
lb_policy: ROUND_ROBIN

# LEAST_REQUEST
lb_policy: LEAST_REQUEST
least_request_lb_config:
  choice_count: 2

# RING_HASH
lb_policy: RING_HASH
ring_hash_lb_config:
  minimum_ring_size: 1024
  maximum_ring_size: 4096

# MAGLEV
lb_policy: MAGLEV
maglev_lb_config:
  table_size: 65537

# RANDOM
lb_policy: RANDOM

Active Health Checks

Envoy can perform health checks over different protocols:

HTTP – sends an HTTP request to the upstream host.

gRPC – sends a gRPC request.

L3/L4 – sends a configurable byte buffer.

Redis – sends a Redis PING command and expects a PONG response.

health_checks:
- timeout: 1s
  interval: 5s
  unhealthy_threshold: 3
  healthy_threshold: 2
  http_health_check:
    path: "/healthz"
    expected_statuses:
    - start: 200
      end: 399

Full Example Configuration

# Configuration file
$ cat /tmp/envoy-demo.yaml
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          access_log:
          - name: envoy.access_loggers.stdout
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: simple_cluster

  clusters:
  - name: simple_cluster
    # Load balancing policy
    lb_policy: ROUND_ROBIN
    # Static service discovery
    type: STATIC
    load_assignment:
      cluster_name: simple_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address: { address: 172.139.20.170, port_value: 8090 }
        - endpoint:
            address:
              socket_address: { address: 172.139.20.3, port_value: 8090 }
        - endpoint:
            address:
              socket_address: { address: 172.139.20.92, port_value: 8090 }
    # Health checks
    health_checks:
    - timeout: 5s
      interval: 15s
      no_traffic_interval: 60s
      unhealthy_threshold: 3
      healthy_threshold: 1
      http_health_check:
        path: /ping
        expected_statuses:
        - start: 200
          end: 399
    # Enable health‑check failure logging
    event_logger:
    - name: envoy.health_checkers.event_logger.file
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.health_check.event_sinks.file.v3.HealthCheckEventFileSink
        event_log_path: "/tmp/envoy_health_events.log"
      always_log_health_check_failures: true
    # Disable panic threshold
    common_lb_config:
      healthy_panic_threshold:
        value: 0

# Run Envoy
$ docker run -d --name envoy \
  -v /tmp/envoy-demo.yaml:/etc/envoy/envoy.yaml:ro \
  -p 10000:10000 \
  envoyproxy/envoy:v1.28.7

Conclusion

Service discovery lets Envoy dynamically perceive backend changes and integrate with various registries.

Load balancing offers rich policy choices to suit different business scenarios.

Active health checks improve overall system robustness.

For operators, a deep understanding of these mechanisms enables better configuration, tuning, fault isolation, capacity planning, and high‑availability design.

image.png
image.png
service meshEnvoyhealth checks
Linux Ops Smart Journey
Written by

Linux Ops Smart Journey

The operations journey never stops—pursuing excellence endlessly.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.