Load Balancing Algorithms in Service Mesh (ASM): Advantages, Limitations, and Practical Deployment
This article explains why native Kubernetes Service load balancing is limited, introduces the richer algorithms offered by Alibaba Cloud Service Mesh (ASM) such as RANDOM, ROUND_ROBIN, LEAST_REQUEST, and PEAK_EWMA, and provides a step‑by‑step deployment and testing guide to illustrate their behavior in real scenarios.
In modern distributed systems, microservices rely on service discovery and load balancing; while Kubernetes Service provides a non‑intrusive solution, its load‑balancing algorithm is limited to random selection, has O(n) complexity, and operates only at the TCP connection level, making it unsuitable for many HTTP/GRPC use cases.
Alibaba Cloud Service Mesh (ASM) builds on Kubernetes and adds a sidecar proxy that works at layer 7, enabling request‑level load balancing and supporting multiple algorithms, thus offering more flexibility for complex business scenarios.
The ASM algorithms include:
RANDOM and ROUND_ROBIN : simple and widely supported, but they ignore backend health and can cause overload on weaker instances.
LEAST_REQUEST : the default ASM algorithm; it tracks the number of in‑flight requests per endpoint and routes traffic to the instance with the fewest pending requests, improving performance when backend capacities differ.
PEAK_EWMA : introduced in ASM 1.21; it calculates a weight for each endpoint based on recent response latency, error rate, and static weight, allowing the load balancer to avoid error‑prone or high‑latency pods.
To demonstrate these algorithms, the article provides a complete YAML deployment that creates two simple-server deployments (one normal, one forced to return HTTP 503), a Service, a ServiceAccount for a sleep pod, and the necessary VirtualService and DestinationRule resources. The YAML snippets are:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: simple-server
name: simple-server-normal
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: simple-server
template:
metadata:
labels:
app: simple-server
spec:
containers:
- args:
- --mode
- normal
image: registry-cn-hangzhou.ack.aliyuncs.com/test-public/simple-server:v1.0.0.2-gae1f6f9-aliyun
imagePullPolicy: IfNotPresent
name: simple-server
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: simple-server
name: simple-server-503
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: simple-server
template:
metadata:
labels:
app: simple-server
spec:
containers:
- args:
- --mode
- "503"
image: registry-cn-hangzhou.ack.aliyuncs.com/test-public/simple-server:v1.0.0.2-gae1f6f9-aliyun
imagePullPolicy: IfNotPresent
name: simple-server
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
---
apiVersion: v1
kind: Service
metadata:
labels:
app: simple-server
name: simple-server
namespace: default
spec:
ports:
- name: http
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: simple-server
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: sleep
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sleep
spec:
replicas: 1
selector:
matchLabels:
app: sleep
template:
metadata:
labels:
app: sleep
spec:
terminationGracePeriodSeconds: 0
serviceAccountName: sleep
containers:
- name: sleep
image: registry-cn-hongkong.ack.aliyuncs.com/test/curl:asm-sleep
command: ["/bin/sleep", "infinity"]
imagePullPolicy: IfNotPresentAfter deploying, a VirtualService disables retries, and a test command is run from the sleep pod:
$ kubectl exec -it deploy/sleep -c sleep -- sh -c 'for i in $(seq 1 10); do curl -s -o /dev/null -w "%{http_code}\n" simple-server:8080/hello; done'With the default LEAST_REQUEST algorithm, the 503‑returning pod is still selected, leading to many error responses. When the PEAK_EWMA algorithm is enabled via a DestinationRule , error‑prone endpoints quickly receive a lower weight, and subsequent requests are routed to the healthy pod, dramatically reducing 503 responses.
In summary, choosing the appropriate load‑balancing algorithm based on backend capacity, request complexity, latency, and error rate is crucial for optimal performance; ASM’s PEAK_EWMA extends traditional methods by incorporating comprehensive health signals, making it especially suitable for AI workloads and other latency‑sensitive services.
Alibaba Cloud Infrastructure
For uninterrupted computing services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.