Cloud Native 16 min read

How We Built a High‑Performance OpenResty API Gateway on Kubernetes

This article details the design and implementation of a Kubernetes‑native API Gateway built with OpenResty, covering its architecture, controller logic, HTTP/gRPC load balancing, custom ingress handling, rate‑limiting, service proxying, and future plans for service‑mesh integration.

Jike Tech Team
Jike Tech Team
Jike Tech Team
How We Built a High‑Performance OpenResty API Gateway on Kubernetes

Introduction

OpenResty is a high‑performance web platform based on Nginx and Lua, ideal for building dynamic web applications, services, and gateways that handle massive concurrency and high scalability.

We built an API Gateway for Kubernetes clusters using OpenResty. It serves as the entry point for all user traffic, handling authentication, routing, rate limiting, IP black/white lists, load balancing, traffic monitoring, and logging.

Architecture

The gateway consists of two parts: a controller that watches Kubernetes resources and writes state to Redis, and an OpenResty instance that performs reverse proxying and load balancing.

API Gateway Controller

Because there is no mature Lua Kubernetes client, we implemented the controller in Go to sync cluster state to Redis.

The controller watches the following resources:

ConfigMaps in specific namespaces (used to extend Ingress resources)

Services and Endpoints – it stores newly added or removed Services and their Endpoints in Redis. For ExternalName Services, it resolves the DNS name (or uses the IP directly) and stores the result as an Endpoint.

Pod health changes – updated Endpoints are also written to Redis.

API Gateway Service

The gateway Service is exposed as a

NodePort

rather than a

LoadBalancer

to avoid automatic changes by the Alibaba Cloud controller manager.

External traffic first hits the Alibaba Cloud SLB, which terminates TLS and forwards HTTP requests to the NodePort. The gateway then routes based on host and path to the appropriate namespace Service.

Feature Overview

Load Balancing (HTTP/gRPC)

Initially the gateway accessed backend Pods via Service ClusterIP. Although iptables performed L4 load balancing, keepalive connections caused OpenResty to repeatedly talk to the same Pod, resulting in uneven distribution.

We first introduced Traefik as an ingress controller to achieve layer‑7 load balancing, then refactored the gateway to perform balanced upstream selection directly in OpenResty.

Implement a controller that stores each Service’s Endpoints in Redis.

Periodically pull the Endpoints from Redis and use

balancer_by_lua_file

to set the upstream peer dynamically.

Key Lua code snippets:

<code>local host = ngx.var.host
local matching_route = routers.find_matching_route(host, ngx.var.uri)
if not matching_route then
    utils.exit_abnormally('no matching route: ' .. host , ngx.HTTP_NOT_FOUND)
end
local balancer = balancer_services.find_balancer(matching_route.service)
if balancer == nil then
    utils.exit_abnormally('cannot find balancer', ngx.HTTP_SERVICE_UNAVAILABLE)
end
ngx.ctx.balancer = balancer</code>

In

balancer_by_lua

:

<code>local picked = ngx.ctx.balancer:balance()
local ok, err = ngx_balancer.set_current_peer(picked)
if not ok then
    utils.exit_abnormally('failed to set current peer: ' .. err, ngx.HTTP_SERVICE_UNAVAILABLE)
end</code>

After refactoring, we removed Traefik and achieved layer‑7 load balancing across Pods with OpenResty alone:

Performance impact: CPU usage dropped ~30 cores (~50%) and memory usage decreased by 7 GB (~70%).

For gRPC, we faced similar load‑balancing challenges because HTTP/2 reuses a single TCP connection. Existing solutions (headless Service with periodic DNS resolution, kuberesolver, Service Mesh) were either slow or complex. Since Nginx 1.13.10 supports gRPC proxying, we implemented gRPC load balancing by passing a custom header in gRPC metadata and using

balancer_by_lua_file

to select the upstream.

Supported balancing strategies are round‑robin and sticky (hash by user ID or IP).

Ingress Controller

Native Ingress lacks expressive power for authentication, timeouts, rate limiting, and IP whitelists. We defined a custom resource called

Route

(stored in a ConfigMap) to express these requirements.

Routes can include a placeholder

mark

in the host name, which maps to a specific namespace, enabling simple domain‑based environment switching.

<code>hosts:
- foo{mark}.example.com
envs:
- mark: -prod
  namespace: foo
- mark: -test
  namespace: test</code>

When a user accesses

foo-prod.example.com

, traffic is routed to the

foo

namespace;

foo-test.example.com

routes to the

test

namespace.

Routes, Services, and Paths can each define authentication, timeout, and rate‑limit policies, with lower‑level settings overriding higher‑level ones.

<code>services:
- name: foo-service
  port: 8080
  access:
    auth_type: public
  paths:
  - access:
      auth_type: login
    timeout: 10
    uri: /headers
  - access:
      rate_limits:
      - burst: 0
        rate: 100
    timeout: 5
    uri: /</code>

This configuration means

foo-service:8080/

has a 5 s timeout, open access, and a limit of 100 requests per second, while

/headers

requires login and has a 10 s timeout.

Rate‑Limiting Module

We implement coarse‑grained rate limiting using a

rate_limit

object stored in Redis. Selectors can be

ip

,

user

,

service

, or

path

, and limits can be defined at the route, service, or path level.

<code>{
    rate: <int>,  # requests per second
    burst: <int>, # allowed burst above rate
    selectors: [&lt;str&gt;]
}</code>

Service Proxy

To expose internal Services to other VMs in the same VPC, we added a service‑proxy feature. An OpenResty upstream with

balancer_by_lua_file

selects the target Service based on a specially formatted URL.

<code>upstream service_proxy_balancer {
    server 127.0.0.1;
    balancer_by_lua_file /path/to/balancer.lua;
}
location ~ ^/__(?<up_service>[a-z0-9\-]+)\.(?<up_namespace>[a-z0-9\-]+)\.(?<up_port>\d+)__(?<up_uri>.*) {
    access_by_lua_file /path/to/access.lua;
    proxy_pass http://service_proxy_balancer;
}</code>

Clients can reach a Service with a URL like

http://my.intranet/__service-name.namespace.port__/path

, and the gateway extracts the Service details to proxy the request.

Future Plans

We are exploring Service Mesh (e.g., Istio) to offload traffic control, load balancing, observability, and fault‑injection to the data plane, allowing the gateway to focus on routing and authentication.

With Istio, canary or blue‑green deployments could be automated by adjusting

DestinationRule

subsets and

VirtualService

weights, using metrics to drive rollbacks.

We also consider replacing the Redis polling mechanism with an xDS‑style gRPC stream so the gateway receives configuration updates instantly.

Thank you for reading; we look forward to your feedback.

References

CoreDNS issue 2324

Link 2

kuberesolver

nginx gRPC module

gRPC metadata handling

KubernetesLoad BalancingAPI gatewayservice meshIngressOpenResty
Jike Tech Team
Written by

Jike Tech Team

Article sharing by the Jike Tech Team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.