Cloud Native 15 min read

Load Balancing Algorithms in Service Mesh (ASM): Advantages, Limitations, and Practical Deployment

This article explains why native Kubernetes Service load balancing is limited, introduces the richer algorithms offered by Alibaba Cloud Service Mesh (ASM) such as RANDOM, ROUND_ROBIN, LEAST_REQUEST, and PEAK_EWMA, and provides a step‑by‑step deployment and testing guide to illustrate their behavior in real scenarios.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Load Balancing Algorithms in Service Mesh (ASM): Advantages, Limitations, and Practical Deployment

In modern distributed systems, microservices rely on service discovery and load balancing; while Kubernetes Service provides a non‑intrusive solution, its load‑balancing algorithm is limited to random selection, has O(n) complexity, and operates only at the TCP connection level, making it unsuitable for many HTTP/GRPC use cases.

Alibaba Cloud Service Mesh (ASM) builds on Kubernetes and adds a sidecar proxy that works at layer 7, enabling request‑level load balancing and supporting multiple algorithms, thus offering more flexibility for complex business scenarios.

The ASM algorithms include:

RANDOM and ROUND_ROBIN : simple and widely supported, but they ignore backend health and can cause overload on weaker instances.

LEAST_REQUEST : the default ASM algorithm; it tracks the number of in‑flight requests per endpoint and routes traffic to the instance with the fewest pending requests, improving performance when backend capacities differ.

PEAK_EWMA : introduced in ASM 1.21; it calculates a weight for each endpoint based on recent response latency, error rate, and static weight, allowing the load balancer to avoid error‑prone or high‑latency pods.

To demonstrate these algorithms, the article provides a complete YAML deployment that creates two simple-server deployments (one normal, one forced to return HTTP 503), a Service, a ServiceAccount for a sleep pod, and the necessary VirtualService and DestinationRule resources. The YAML snippets are:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: simple-server
  name: simple-server-normal
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: simple-server
  template:
    metadata:
      labels:
        app: simple-server
    spec:
      containers:
      - args:
        - --mode
        - normal
        image: registry-cn-hangzhou.ack.aliyuncs.com/test-public/simple-server:v1.0.0.2-gae1f6f9-aliyun
        imagePullPolicy: IfNotPresent
        name: simple-server
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: simple-server
  name: simple-server-503
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: simple-server
  template:
    metadata:
      labels:
        app: simple-server
    spec:
      containers:
      - args:
        - --mode
        - "503"
        image: registry-cn-hangzhou.ack.aliyuncs.com/test-public/simple-server:v1.0.0.2-gae1f6f9-aliyun
        imagePullPolicy: IfNotPresent
        name: simple-server
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: simple-server
  name: simple-server
  namespace: default
spec:
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: simple-server
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sleep
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleep
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleep
  template:
    metadata:
      labels:
        app: sleep
    spec:
      terminationGracePeriodSeconds: 0
      serviceAccountName: sleep
      containers:
      - name: sleep
        image: registry-cn-hongkong.ack.aliyuncs.com/test/curl:asm-sleep
        command: ["/bin/sleep", "infinity"]
        imagePullPolicy: IfNotPresent

After deploying, a VirtualService disables retries, and a test command is run from the sleep pod:

$ kubectl exec -it deploy/sleep -c sleep -- sh -c 'for i in $(seq 1 10); do curl -s -o /dev/null -w "%{http_code}\n" simple-server:8080/hello; done'

With the default LEAST_REQUEST algorithm, the 503‑returning pod is still selected, leading to many error responses. When the PEAK_EWMA algorithm is enabled via a DestinationRule , error‑prone endpoints quickly receive a lower weight, and subsequent requests are routed to the healthy pod, dramatically reducing 503 responses.

In summary, choosing the appropriate load‑balancing algorithm based on backend capacity, request complexity, latency, and error rate is crucial for optimal performance; ASM’s PEAK_EWMA extends traditional methods by incorporating comprehensive health signals, making it especially suitable for AI workloads and other latency‑sensitive services.

DeploymentKubernetesLoad BalancingService MeshASMLEAST_REQUESTPEAK_EWMA
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.