Cloud Native 8 min read

Service Discovery in Envoy: Types, Consistency Models, and Health‑Check Routing

This article explains Envoy’s service discovery mechanisms—including static, strict DNS, logical DNS, original‑destination, and Service Discovery Service—detailing how they work, their consistency models, and how health‑checking influences routing decisions in production environments.

Architects Research Society
Architects Research Society
Architects Research Society
Service Discovery in Envoy: Types, Consistency Models, and Health‑Check Routing

When defining upstream clusters in configuration, Envoy must know how to resolve the members of the cluster, which is called service discovery.

Supported Service Discovery Types

Static

Static is the simplest type; the configuration explicitly lists each upstream host’s resolved network name (IP/port, Unix socket, etc.).

Strict DNS

With strict DNS, Envoy continuously and asynchronously resolves the specified DNS target. Each IP address returned is treated as an explicit host in the upstream cluster, and hosts are added or removed based on DNS results. Envoy never performs synchronous DNS resolution in the forwarding path, accepting eventual consistency.

Logical DNS

Logical DNS uses the same asynchronous mechanism as strict DNS but assumes the first IP address returned represents the whole upstream cluster, keeping a single connection pool that can serve many physical hosts. This avoids connection churn when interacting with large web services that return many IPs per query.

Original Destination

When inbound connections are redirected to Envoy via iptables REDIRECT or the Proxy protocol, the original destination cluster can be used. Requests are forwarded to the upstream host based on redirect metadata without explicit host configuration. Unused hosts are cleaned up after a configurable idle interval.

Service Discovery Service (SDS)

SDS is a generic REST API that Envoy uses to obtain cluster members. Lyft’s reference implementation uses AWS DynamoDB as a backend, but the API is simple enough to be implemented over various stores. Envoy periodically polls the SDS for members, making it the preferred mechanism because it provides per‑host insight and additional attributes such as load‑balancing weight, canary status, and region.

Typically, active health checking is combined with eventually consistent service discovery data to drive load‑balancing and routing decisions.

Eventually Consistent Service Discovery

Many RPC systems require strongly consistent service discovery using systems like Zookeeper, etcd, or Consul, which can be painful at scale. Envoy is designed for eventually consistent discovery, assuming hosts appear in the mesh in an eventually consistent manner and recommending active health checks to determine cluster health.

All health decisions are fully distributed, allowing graceful handling of network partitions. Envoy uses a 2×2 matrix to decide routing based on discovery status and health‑check result:

Discovery Status

HC OK

HC Failed

Discovered

Route

Don’t Route

Absent

Route

Don’t Route / Delete

Host discovered & health‑check OK – Envoy routes to the target host.

Host absent & health‑check OK – Envoy still routes to the target host, allowing existing hosts to continue serving while new hosts cannot be added until discovery data returns.

Host discovered & health‑check FAIL – Envoy does not route to the target host, assuming health‑check data is more accurate.

Host absent & health‑check FAIL – Envoy does not route and deletes the target host; this is the only state where Envoy clears host data.

cloud nativeservice discoveryDNSEnvoyHealth Check
Architects Research Society
Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.