Cloud Native 40 min read

How to Push Ingress Nginx to 100k QPS on a Single Pod – Full‑Stack Performance Tuning Guide

This article walks through a systematic, layer‑by‑layer performance tuning of Ingress Nginx on Kubernetes, covering worker process settings, connection and keep‑alive tuning, buffer and timeout adjustments, SSL/TLS optimizations, load‑balancing algorithms, kernel parameters, logging, rate‑limiting, benchmarking methods, troubleshooting tips, and a migration path to the Gateway API, all validated with real‑world load‑test results that achieve over 100 000 QPS on a 4 CPU/8 GiB pod.

MaGe Linux Operations

Feb 10, 2026

How to Push Ingress Nginx to 100k QPS on a Single Pod – Full‑Stack Performance Tuning Guide

1. Overview

Ingress Nginx is the most widely used traffic entry point in Kubernetes clusters, handling all HTTP/HTTPS routing. The default configuration is sufficient for development but quickly becomes a bottleneck in production, where issues such as insufficient worker processes, exhausted connection pools, CPU‑heavy SSL handshakes, and upstream timeouts appear.

Typical default settings on a 4C8G pod yield only ~20 k QPS. By systematically tuning dozens of parameters, the same pod can sustain >100 k QPS.

2. Architecture and Performance Bottlenecks

The request flow in Ingress Nginx is:

Client → NodePort/LoadBalancer → Nginx Worker → SSL/TLS termination → request parsing → routing → rate‑limit check → upstream selection → proxy to backend Pod → response back to client

Performance limits usually occur in four stages:

Connection establishment : TCP and SSL handshakes consume CPU and time.

Request processing : insufficient worker_processes or worker_connections.

Proxy forwarding : low upstream connection reuse and poor timeout settings.

Response return : inadequate buffer sizes causing disk I/O.

3. Worker Process and Connection Optimization

3.1 Worker Processes

Ingress Nginx uses the classic master‑worker model. Each worker is single‑threaded and uses epoll for non‑blocking I/O.

Recommended settings (via ConfigMap):

worker-processes: "auto"   # equals the pod's CPU limit
worker-connections: "65536"
max-worker-open-files: "131072"
worker-shutdown-timeout: "30s"

Key points:

Do not set worker_processes higher than the CPU limit; excess workers cause context‑switch overhead.

When the pod has no CPU limit, auto reads the host’s core count, which can lead to many idle workers and worse performance.

3.2 Connection Limits

The theoretical maximum concurrent connections is worker_processes × worker_connections. In practice each request uses two connections (client ↔ Nginx and Nginx ↔ upstream), so the max concurrent requests ≈ (worker_processes × worker_connections) / 2. With 4 workers and 65 536 connections, the pod can handle ~130 k concurrent requests.

Adjust the following ConfigMap entries:

worker-connections: "65536"
worker-rlimit-nofile: "131072"

4. Keep‑Alive and Upstream Connection Reuse

4.1 Client‑Side Keep‑Alive

Enabling long‑lived client connections reduces TCP handshakes dramatically. Recommended values:

keep-alive-requests: "10000"
keep-alive: "75"

4.2 Upstream Keep‑Alive

By default Nginx opens a new TCP connection to each backend for every request, which is a major overhead at high QPS. Enabling upstream keep‑alive reuses connections:

upstream-keepalive-connections: "320"
upstream-keepalive-requests: "10000"
upstream-keepalive-timeout: "60"

Real‑world tests show QPS jumping from ~35 k to >70 k when this is enabled.

5. Buffer and Timeout Tuning

5.1 Proxy Buffering

When the response body exceeds the buffer size, Nginx writes a temporary file, causing I/O spikes. Recommended buffer settings:

proxy-buffering: "true"
proxy-buffer-size: "8k"
proxy-buffers-number: "8"
proxy-body-size: "16m"
proxy-max-temp-file-size: "0"   # disable temp files

5.2 Timeout Parameters

Mis‑configured timeouts either kill healthy requests or keep connections open too long. Suggested values:

proxy-connect-timeout: "5"
proxy-read-timeout: "60"
proxy-send-timeout: "60"
proxy-next-upstream: "error timeout http_502 http_503 http_504"
proxy-next-upstream-timeout: "5"
proxy-next-upstream-tries: "3"

6. Logging Optimization

Access logs can become a performance killer at 100 k QPS (10 k writes per second). Instead of disabling logs, enable buffered logging:

access-log-params: "buffer=256k flush=5s"
log-format-upstream: '$remote_addr - $request_method $host$uri $status $body_bytes_sent $request_time'

This writes logs to a 256 KB memory buffer and flushes every 5 seconds, turning many small writes into larger sequential I/O.

7. SSL/TLS Optimizations

7.1 Session Cache

SSL handshakes are CPU‑intensive. Caching sessions avoids full handshakes for repeat connections.

ssl-session-cache-size: "50m"   # stores ~20 k sessions
ssl-session-timeout: "1d"
ssl-session-tickets: "true"

7.2 OCSP Stapling

Enabling OCSP stapling removes the client‑side OCSP query, saving hundreds of milliseconds per handshake:

enable-ocsp: "true"

7.3 Protocols and Cipher Suites

Prefer TLS 1.3 and modern ciphers; use ECDSA certificates for faster asymmetric crypto.

ssl-protocols: "TLSv1.2 TLSv1.3"
ssl-ciphers: "TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256"
ssl-prefer-server-ciphers: "on"

7.4 Early Data (0‑RTT)

For idempotent GET requests, enable 0‑RTT to send application data with the first TLS record:

ssl-early-data: "true"

Be aware of replay‑attack risks and guard non‑idempotent methods.

8. Load‑Balancing Algorithm Selection

Ingress Nginx supports several algorithms. The default round_robin is simple but does not account for heterogeneous backend performance. The ewma algorithm dynamically weights backends based on recent response times, yielding 20‑40 % higher throughput in mixed‑node clusters.

load-balance: "ewma"

When session affinity is required, prefer cookie‑based affinity over IP‑hash for better distribution.

9. Rate Limiting and WAF

9.1 Global and Per‑Ingress Rate Limits

Ingress Nginx includes a Lua‑based rate‑limit module. Example per‑Ingress annotations:

nginx.ingress.kubernetes.io/limit-rps: "100"
nginx.ingress.kubernetes.io/limit-burst-multiplier: "5"
nginx.ingress.kubernetes.io/limit-connections: "50"

For true global QPS caps across multiple controller replicas, configure a Memcached‑backed global-rate-limit set.

9.2 ModSecurity WAF

Enabling the full OWASP CRS reduces QPS by 15‑25 % but provides strong protection. If the impact is unacceptable, enable only selected rules.

enable-modsecurity: "true"
enable-owasp-modsecurity-crs: "true"
modsecurity-snippet: |
    SecRuleEngine On
    SecRequestBodyLimit 10485760
    SecAuditLogType Serial
    SecAuditLog /dev/stdout

10. Kernel Parameter Tuning

Linux kernel limits often become the hidden ceiling. Recommended sysctl settings (applied via an initContainer or node‑level script):

net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = "1024 65535"
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.core.netdev_max_backlog = 65535

Also raise the file‑descriptor limit for the container:

max-worker-open-files: "131072"

11. Best‑Practice ConfigMap

A consolidated ConfigMap that incorporates all the above recommendations looks like:

apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
data:
  # Worker
  worker-processes: "auto"
  worker-connections: "65536"
  max-worker-open-files: "131072"
  worker-shutdown-timeout: "30s"
  # Keep‑Alive
  keep-alive-requests: "10000"
  keep-alive: "75"
  upstream-keepalive-connections: "320"
  upstream-keepalive-requests: "10000"
  upstream-keepalive-timeout: "60"
  # Buffers
  proxy-buffering: "true"
  proxy-buffer-size: "8k"
  proxy-buffers-number: "8"
  proxy-body-size: "16m"
  # Timeouts
  proxy-connect-timeout: "5"
  proxy-read-timeout: "60"
  proxy-send-timeout: "60"
  proxy-next-upstream: "error timeout http_502 http_503 http_504"
  proxy-next-upstream-timeout: "5"
  proxy-next-upstream-tries: "3"
  # SSL
  ssl-protocols: "TLSv1.2 TLSv1.3"
  ssl-ciphers: "TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256"
  ssl-prefer-server-ciphers: "on"
  ssl-session-cache-size: "50m"
  ssl-session-timeout: "1d"
  ssl-session-tickets: "true"
  enable-ocsp: "true"
  # Load‑balancing
  load-balance: "ewma"
  # Logging
  access-log-params: "buffer=256k flush=5s"
  log-format-upstream: '$remote_addr - $request_method $host$uri $status $body_bytes_sent $request_time'
  # Misc
  use-gzip: "true"
  gzip-level: "3"
  enable-brotli: "true"
  brotli-level: "3"
  use-forwarded-headers: "true"
  compute-full-forwarded-for: "true"
  enable-real-ip: "true"

12. Pod Resource Recommendations

Allocate dedicated nodes for Ingress, set guaranteed QoS (requests = limits), and use CPU manager static policy to bind workers to cores.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  replicas: 2
  template:
    spec:
      nodeSelector:
        node-role.kubernetes.io/ingress: "true"
      tolerations:
      - key: "node-role.kubernetes.io/ingress"
        operator: Exists
        effect: NoSchedule
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values: ["ingress-nginx"]
            topologyKey: kubernetes.io/hostname
      containers:
      - name: controller
        resources:
          requests:
            cpu: "4"
            memory: "8Gi"
          limits:
            cpu: "4"
            memory: "8Gi"
        securityContext:
          privileged: true

13. Graceful Shutdown and Rolling Updates

To avoid traffic loss during updates, configure the deployment as:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  terminationGracePeriodSeconds: 60
  template:
    spec:
      containers:
      - name: controller
        lifecycle:
          preStop:
            exec:
              command: ["/wait-shutdown"]

This ensures the old pod finishes in‑flight requests before termination.

14. Benchmarking and Validation

14.1 Load‑Testing Tools

Use wrk for quick stress tests and vegeta for constant‑rate load that mimics production traffic.

# wrk example (8 threads, 200 connections, 30s)
wrk -t8 -c200 -d30s https://example.com/api

# vegeta constant 50k QPS for 60s
echo "GET https://example.com/api" | vegeta attack -rate=50000/s -duration=60s | vegeta report

Key metrics to watch: Requests/sec, p99 latency, socket errors, CPU usage, and Nginx connection counters.

14.2 Monitoring Commands

# Nginx connection status
kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- curl -s http://localhost:10246/nginx_status

# Worker CPU usage
kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- top -bn1 -p $(pgrep -d',' nginx)

# System TCP stats
kubectl exec -n ingress-nginx deploy/ingress-nginx-controller -- ss -s

15. Troubleshooting Checklist

QPS stuck, CPU low : likely worker_connections limit reached. Check nginx_connections_active.

CPU saturated : SSL handshake overhead. Verify session reuse and consider ECDSA certs.

Latency spikes : low upstream keep‑alive reuse. Inspect TIME_WAIT counts.

502 errors : upstream pods unhealthy or timeout. Review proxy-next-upstream settings.

Memory growth : old workers not exiting. Reduce worker-shutdown-timeout.

Connection refused : kernel somaxconn or FD limits too low. Increase sysctl values.

16. Gateway API Migration Path

Ingress API has limited expressiveness and mixes infrastructure with application routing. Gateway API introduces a clean role hierarchy (GatewayClass → Gateway → HTTPRoute) and standardizes features such as header matching, traffic splitting, and request mirroring.

Ingress Nginx supports Gateway API experimentally (enable with --enable-gateway-api flag or Helm controller.extraArgs.enable-gateway-api="true").

Migration example:

# GatewayClass (platform team)
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: nginx
spec:
  controllerName: k8s.io/ingress-nginx
---
# Gateway (cluster admin)
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: main-gateway
  namespace: ingress-nginx
spec:
  gatewayClassName: nginx
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    tls:
      mode: Terminate
      certificateRefs:
      - name: app-tls
    allowedRoutes:
      namespaces:
        from: All
---
# HTTPRoute (app developer)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: my-app
  namespace: default
spec:
  parentRefs:
  - name: main-gateway
    namespace: ingress-nginx
  hostnames:
  - app.example.com
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /api
    filters:
    - type: URLRewrite
      urlRewrite:
        path:
          type: ReplacePrefixMatch
          replacePrefixMatch: "/"
    backendRefs:
    - name: api-service
      port: 80

Adopt a phased migration: keep existing Ingress resources, introduce Gateway resources for new services, gradually shift traffic, then retire old Ingress objects.

17. Summary

By applying the tuning steps above, a 4C8G Ingress Nginx pod can achieve:

~110 k HTTP QPS (≈340 % increase)

~85 k HTTPS QPS (≈467 % increase)

p99 latency of ~3 ms at 100 k QPS

Upstream keep‑alive reuse >95 %

SSL session reuse >92 %

The most impactful optimizations are upstream keep‑alive, worker‑connection/FD limits, and SSL session caching. Follow the prioritized checklist (upstream keep‑alive → connection limits → SSL → kernel tuning → logging → EWMA load‑balancing → BBR) to achieve the biggest gains with the least effort.

Optimization Kubernetes Load Balancing TLS Ingress

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.