How to Double K8s Ingress Performance: Nginx vs Envoy Gateway Tuning Guide
This article walks through a real‑world performance bottleneck on a high‑traffic e‑commerce platform, explains step‑by‑step deep tuning of Nginx Ingress Controller, compares it with Envoy Gateway, and provides concrete configurations, benchmark results, monitoring rules, and best‑practice recommendations for Kubernetes Ingress optimization.
Overview
A sudden ten‑fold traffic surge during a promotion caused the default Nginx Ingress Controller to time out and increase latency dramatically. After two weeks of investigation the author performed deep performance tuning of Nginx Ingress and a comparative evaluation of Envoy Gateway to decide when each solution is appropriate.
Technical Characteristics
Nginx Ingress Controller – mature, large community, configuration via ConfigMap and annotations.
Envoy Gateway – cloud‑native, built on the Kubernetes Gateway API, supports dynamic configuration without restart and richer traffic‑management features.
Applicable Scenarios
Existing Nginx Ingress performance bottleneck that requires deep tuning.
New cluster selection between Nginx and Envoy.
Need for advanced traffic‑management features such as canary releases or traffic mirroring.
High observability requirements.
Environment Requirements
Kubernetes >= 1.25 (Gateway API needs a recent version).
Nginx Ingress Controller >= 1.9 (latest stable recommended).
Envoy Gateway >= 1.0 (GA).
Load‑testing tools: hey, wrk or k6.
Detailed Steps
1. Preparation
Deploy a simple echo‑server as the backend and expose it via a Service.
# Deploy test backend service
apiVersion: apps/v1
kind: Deployment
metadata:
name: echo-server
namespace: default
spec:
replicas: 10
selector:
matchLabels:
app: echo-server
template:
metadata:
labels:
app: echo-server
spec:
containers:
- name: echo
image: ealen/echo-server:latest
ports:
- containerPort: 80
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
name: echo-server
namespace: default
spec:
selector:
app: echo-server
ports:
- port: 80
targetPort: 80Install load‑testing tools.
# macOS
brew install hey
# Ubuntu
sudo apt install -y wrk
# Example benchmark command
hey -n 100000 -c 200 -q 1000 http://your-ingress-domain/2. Nginx Ingress Controller Deep Tuning
2.1 Deploy via Helm
# Add repo and install
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm show values ingress-nginx/ingress-nginx > nginx-ingress-values.yaml
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
-f nginx-ingress-values.yaml2.2 Core configuration (excerpt of nginx-ingress-values.yaml )
# Replicas and resources
controller:
replicaCount: 3
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- ingress-nginx
topologyKey: kubernetes.io/hostname
service:
type: LoadBalancer
externalTrafficPolicy: Local
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
config:
# Worker processes – match CPU cores
worker-processes: "auto"
# Max connections per worker
max-worker-connections: "65535"
# Keep‑alive settings (connection reuse)
keep-alive: "75"
keep-alive-requests: "10000"
upstream-keepalive-connections: "320"
upstream-keepalive-timeout: "60"
upstream-keepalive-requests: "10000"
# Buffer settings
proxy-buffer-size: "16k"
proxy-buffers-number: "4"
proxy-body-size: "50m"
# Timeouts (upstream < ingress < client)
proxy-connect-timeout: "5"
proxy-read-timeout: "25"
proxy-send-timeout: "25"
# Gzip compression
use-gzip: "true"
gzip-level: "4"
gzip-min-length: "1000"
gzip-types: "application/json application/javascript text/css text/plain application/xml"
# HTTP/2
use-http2: "true"
# SSL hardening (example)
ssl-protocols: "TLSv1.2 TLSv1.3"
ssl-ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384"
ssl-session-cache: "true"
ssl-session-cache-size: "10m"
ssl-session-timeout: "10m"
ssl-session-tickets: "true"
# Load‑balancing algorithm – EWMA gives better latency than round‑robin
load-balance: "ewma"
# Rate‑limit protection
limit-req-status-code: "429"
limit-conn-status-code: "429"
# Prometheus metrics
metrics:
enabled: true
serviceMonitor:
enabled: true
namespace: monitoring2.3 Annotation‑based tuning for a specific Ingress
# High‑performance Ingress example
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: high-performance-ingress
namespace: default
annotations:
nginx.ingress.kubernetes.io/load-balance: "ewma"
nginx.ingress.kubernetes.io/upstream-keepalive-connections: "320"
nginx.ingress.kubernetes.io/upstream-keepalive-timeout: "60"
nginx.ingress.kubernetes.io/upstream-keepalive-requests: "10000"
nginx.ingress.kubernetes.io/proxy-buffer-size: "16k"
nginx.ingress.kubernetes.io/proxy-buffers-number: "4"
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "5"
nginx.ingress.kubernetes.io/proxy-read-timeout: "25"
nginx.ingress.kubernetes.io/proxy-send-timeout: "25"
nginx.ingress.kubernetes.io/enable-gzip: "true"
nginx.ingress.kubernetes.io/limit-rps: "1000"
nginx.ingress.kubernetes.io/limit-connections: "100"
nginx.ingress.kubernetes.io/proxy-next-upstream: "error timeout http_502 http_503 http_504"
nginx.ingress.kubernetes.io/proxy-next-upstream-tries: "3"
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: echo-server
port:
number: 803. Envoy Gateway Deployment and Configuration
3.1 Install Gateway API CRDs and Envoy Gateway
# Install Gateway API CRDs (v1.0.0)
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.0.0/standard-install.yaml
# Install Envoy Gateway via Helm
helm repo add envoy-gateway https://envoyproxy.io/envoy-gateway-helm/
helm repo update
helm install envoy-gateway envoy-gateway/gateway-helm \
--namespace envoy-gateway-system \
--create-namespace
# Verify installation
kubectl get pods -n envoy-gateway-system3.2 GatewayClass, EnvoyProxy and Gateway resources
# GatewayClass
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: envoy-gateway
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: envoy-proxy-config
namespace: envoy-gateway-system
---
# EnvoyProxy (performance tuning)
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: envoy-proxy-config
namespace: envoy-gateway-system
spec:
provider:
type: Kubernetes
kubernetes:
envoyDeployment:
replicas: 3
container:
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
pod:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- envoy
topologyKey: kubernetes.io/hostname
envoyService:
type: LoadBalancer
bootstrap:
type: Merge
value: |
layered_runtime:
layers:
- name: static_layer
static_layer:
overload:
global_downstream_max_connections: 50000
---
# Gateway resource
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: high-performance-gateway
namespace: default
spec:
gatewayClassName: envoy-gateway
listeners:
- name: http
port: 80
protocol: HTTP
allowedRoutes:
namespaces:
from: Same
- name: https
port: 443
protocol: HTTPS
tls:
mode: Terminate
certificateRefs:
- name: tls-secret
allowedRoutes:
namespaces:
from: Same3.3 HTTPRoute examples
# Basic route (all traffic)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: echo-route
namespace: default
spec:
parentRefs:
- name: high-performance-gateway
namespace: default
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: "/"
backendRefs:
- name: echo-server
port: 80
weight: 100
timeouts:
request: 30s
backendRequest: 25s
---
# Canary release (90% stable, 10% canary)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: canary-route
namespace: default
spec:
parentRefs:
- name: high-performance-gateway
namespace: default
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: "/api"
backendRefs:
- name: api-stable
port: 80
weight: 90
- name: api-canary
port: 80
weight: 10
---
# Header‑based routing example
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: header-based-route
namespace: default
spec:
parentRefs:
- name: high-performance-gateway
namespace: default
hostnames:
- "api.example.com"
rules:
- matches:
- headers:
- name: X-Test-User
value: "true"
backendRefs:
- name: api-test
port: 80
- matches:
- path:
type: PathPrefix
value: "/"
backendRefs:
- name: api-prod
port: 804. Performance Test Script
#!/bin/bash
# performance-test.sh – Ingress performance testing
TARGET_URL=${1:-"http://your-ingress-domain/"}
CONNECTIONS=${2:-200}
REQUESTS=${3:-100000}
QPS=${4:-2000}
echo "=========================================="
echo "Ingress Performance Test"
echo "Target: $TARGET_URL"
echo "Concurrency: $CONNECTIONS"
echo "Requests: $REQUESTS"
echo "QPS limit: $QPS"
echo "=========================================="
# Warm‑up
hey -n 1000 -c 10 "$TARGET_URL" > /dev/null 2>&1
# Main test
RESULT=$(hey -n $REQUESTS -c $CONNECTIONS -q $QPS "$TARGET_URL")
echo "[3/4] Test completed, results:"
echo "$RESULT"
# Extract key metrics
echo "--- Summary ---"
echo "$RESULT" | grep "Requests/sec:"
echo "$RESULT" | grep "Average:"
echo "$RESULT" | grep "Fastest:"
echo "$RESULT" | grep "Slowest:"
echo "$RESULT" | grep "99%"
# Save report
REPORT_FILE="perf_report_$(date +%Y%m%d_%H%M%S).txt"
echo "$RESULT" > "$REPORT_FILE"
echo "Report saved to: $REPORT_FILE"5. Configuration Comparison (Before vs After)
# Before tuning (default)
controller:
replicaCount: 1
resources: {}
config: {}
# After tuning
controller:
replicaCount: 3
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
config:
worker-processes: "auto"
max-worker-connections: "65535"
upstream-keepalive-connections: "320"
upstream-keepalive-timeout: "60"
upstream-keepalive-requests: "10000"
use-gzip: "true"
load-balance: "ewma"6. Performance Test Data
Test conditions: 200 concurrency, 100 000 requests.
Default Nginx Ingress – Requests/sec: 2,847; Avg: 68.32 ms; P99: 234.56 ms; Error rate: 0.2%.
After tuning (Nginx) – Requests/sec: 8,234 (+189%); Avg: 23.45 ms (‑66%); P99: 89.12 ms (‑62%); Error rate: 0%.
Envoy Gateway – Requests/sec: 9,156 (+221%); Avg: 21.12 ms (‑69%); P99: 76.34 ms (‑67%); Error rate: 0%.
Best Practices and Considerations
Resource Planning
CPU – Match worker-processes to pod CPU limits (e.g., 2 cores → worker-processes: "2" or "auto").
Memory – Estimate using base + (max_conn * per_conn_mem) + (buf_num * buf_size). For 65 535 connections, 16 KB buffers, ~1.5 GB is sufficient.
Connection‑Pool Tuning
# Nginx Ingress upstream keepalive
upstream-keepalive-connections: "320"
upstream-keepalive-timeout: "60"
upstream-keepalive-requests: "10000" # Envoy connection‑pool via BackendTrafficPolicy
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: connection-pool-policy
spec:
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: echo-route
connectionPool:
http:
http1MaxPendingRequests: 1024
http2MaxRequests: 1024
tcp:
maxConnections: 1024
connectTimeout: 10sTimeout Recommendations
# Layered timeout strategy (upstream < Ingress < client)
proxy-connect-timeout: "5"
proxy-read-timeout: "25"
proxy-send-timeout: "25"
# Client‑side timeout should be slightly higher, e.g., 30sFault Diagnosis and Monitoring
Log Analysis
# Nginx Ingress logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx -f
# Filter errors
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx | grep -E "(error|warn|5[0-9]{2})"
# Envoy Gateway logs
kubectl logs -n envoy-gateway-system -l app.kubernetes.io/name=envoy -fReal‑time Monitoring Commands
# Nginx stub_status (requires enable)
kubectl exec -n ingress-nginx -it $(kubectl get pod -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx -o jsonpath='{.items[0].metadata.name}') -- curl localhost:10246/nginx_status
# Envoy admin interface
kubectl port-forward -n default svc/envoy-high-performance-gateway 19000:19000
# Then open http://localhost:19000/statsCommon Issues (converted from table)
502 Bad Gateway – Backend unavailable or timed out. Solution: Verify backend pod health and increase upstream timeouts.
503 Service Unavailable – Upstream connection pool exhausted. Solution: Increase upstream-keepalive-connections or adjust pool size.
504 Gateway Timeout – Request timeout. Solution: Raise timeout values or optimise backend.
High latency with normal CPU/Memory – Connection‑establishment overhead. Solution: Enable keepalive, increase worker-processes and max-worker-connections.
Many 5xx under load – Insufficient worker processes. Solution: Increase replica count and worker-processes.
Selection Guidance
Choose Nginx Ingress when the team is already familiar with Nginx, the configuration is simple, and the existing architecture is stable.
Choose Envoy Gateway for advanced traffic‑management (canary, mirroring), richer observability, or when building a new cluster that adopts the Gateway API.
Conclusion
Establish a baseline before any tuning; keep‑alive configuration yields the biggest latency reduction.
Match CPU, memory, and connection limits to workload characteristics.
Observability (Prometheus metrics, Grafana dashboards) is essential for safe tuning.
Deep Nginx tuning can increase throughput by ~3×; switching to Envoy Gateway can add another 10‑20% improvement for feature‑rich use cases.
Further learning paths include Nginx source‑level tuning, deep study of Envoy’s xDS protocol, and following the evolution of the Kubernetes Gateway API.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
