How We Built a High‑Performance OpenResty API Gateway on Kubernetes
This article details the design and implementation of a Kubernetes‑native API Gateway built with OpenResty, covering its architecture, controller logic, HTTP/gRPC load balancing, custom ingress handling, rate‑limiting, service proxying, and future plans for service‑mesh integration.
Introduction
OpenResty is a high‑performance web platform based on Nginx and Lua, ideal for building dynamic web applications, services, and gateways that handle massive concurrency and high scalability.
We built an API Gateway for Kubernetes clusters using OpenResty. It serves as the entry point for all user traffic, handling authentication, routing, rate limiting, IP black/white lists, load balancing, traffic monitoring, and logging.
Architecture
The gateway consists of two parts: a controller that watches Kubernetes resources and writes state to Redis, and an OpenResty instance that performs reverse proxying and load balancing.
API Gateway Controller
Because there is no mature Lua Kubernetes client, we implemented the controller in Go to sync cluster state to Redis.
The controller watches the following resources:
ConfigMaps in specific namespaces (used to extend Ingress resources)
Services and Endpoints – it stores newly added or removed Services and their Endpoints in Redis. For ExternalName Services, it resolves the DNS name (or uses the IP directly) and stores the result as an Endpoint.
Pod health changes – updated Endpoints are also written to Redis.
API Gateway Service
The gateway Service is exposed as a
NodePortrather than a
LoadBalancerto avoid automatic changes by the Alibaba Cloud controller manager.
External traffic first hits the Alibaba Cloud SLB, which terminates TLS and forwards HTTP requests to the NodePort. The gateway then routes based on host and path to the appropriate namespace Service.
Feature Overview
Load Balancing (HTTP/gRPC)
Initially the gateway accessed backend Pods via Service ClusterIP. Although iptables performed L4 load balancing, keepalive connections caused OpenResty to repeatedly talk to the same Pod, resulting in uneven distribution.
We first introduced Traefik as an ingress controller to achieve layer‑7 load balancing, then refactored the gateway to perform balanced upstream selection directly in OpenResty.
Implement a controller that stores each Service’s Endpoints in Redis.
Periodically pull the Endpoints from Redis and use
balancer_by_lua_fileto set the upstream peer dynamically.
Key Lua code snippets:
<code>local host = ngx.var.host
local matching_route = routers.find_matching_route(host, ngx.var.uri)
if not matching_route then
utils.exit_abnormally('no matching route: ' .. host , ngx.HTTP_NOT_FOUND)
end
local balancer = balancer_services.find_balancer(matching_route.service)
if balancer == nil then
utils.exit_abnormally('cannot find balancer', ngx.HTTP_SERVICE_UNAVAILABLE)
end
ngx.ctx.balancer = balancer</code>In
balancer_by_lua:
<code>local picked = ngx.ctx.balancer:balance()
local ok, err = ngx_balancer.set_current_peer(picked)
if not ok then
utils.exit_abnormally('failed to set current peer: ' .. err, ngx.HTTP_SERVICE_UNAVAILABLE)
end</code>After refactoring, we removed Traefik and achieved layer‑7 load balancing across Pods with OpenResty alone:
Performance impact: CPU usage dropped ~30 cores (~50%) and memory usage decreased by 7 GB (~70%).
For gRPC, we faced similar load‑balancing challenges because HTTP/2 reuses a single TCP connection. Existing solutions (headless Service with periodic DNS resolution, kuberesolver, Service Mesh) were either slow or complex. Since Nginx 1.13.10 supports gRPC proxying, we implemented gRPC load balancing by passing a custom header in gRPC metadata and using
balancer_by_lua_fileto select the upstream.
Supported balancing strategies are round‑robin and sticky (hash by user ID or IP).
Ingress Controller
Native Ingress lacks expressive power for authentication, timeouts, rate limiting, and IP whitelists. We defined a custom resource called
Route(stored in a ConfigMap) to express these requirements.
Routes can include a placeholder
markin the host name, which maps to a specific namespace, enabling simple domain‑based environment switching.
<code>hosts:
- foo{mark}.example.com
envs:
- mark: -prod
namespace: foo
- mark: -test
namespace: test</code>When a user accesses
foo-prod.example.com, traffic is routed to the
foonamespace;
foo-test.example.comroutes to the
testnamespace.
Routes, Services, and Paths can each define authentication, timeout, and rate‑limit policies, with lower‑level settings overriding higher‑level ones.
<code>services:
- name: foo-service
port: 8080
access:
auth_type: public
paths:
- access:
auth_type: login
timeout: 10
uri: /headers
- access:
rate_limits:
- burst: 0
rate: 100
timeout: 5
uri: /</code>This configuration means
foo-service:8080/has a 5 s timeout, open access, and a limit of 100 requests per second, while
/headersrequires login and has a 10 s timeout.
Rate‑Limiting Module
We implement coarse‑grained rate limiting using a
rate_limitobject stored in Redis. Selectors can be
ip,
user,
service, or
path, and limits can be defined at the route, service, or path level.
<code>{
rate: <int>, # requests per second
burst: <int>, # allowed burst above rate
selectors: [<str>]
}</code>Service Proxy
To expose internal Services to other VMs in the same VPC, we added a service‑proxy feature. An OpenResty upstream with
balancer_by_lua_fileselects the target Service based on a specially formatted URL.
<code>upstream service_proxy_balancer {
server 127.0.0.1;
balancer_by_lua_file /path/to/balancer.lua;
}
location ~ ^/__(?<up_service>[a-z0-9\-]+)\.(?<up_namespace>[a-z0-9\-]+)\.(?<up_port>\d+)__(?<up_uri>.*) {
access_by_lua_file /path/to/access.lua;
proxy_pass http://service_proxy_balancer;
}</code>Clients can reach a Service with a URL like
http://my.intranet/__service-name.namespace.port__/path, and the gateway extracts the Service details to proxy the request.
Future Plans
We are exploring Service Mesh (e.g., Istio) to offload traffic control, load balancing, observability, and fault‑injection to the data plane, allowing the gateway to focus on routing and authentication.
With Istio, canary or blue‑green deployments could be automated by adjusting
DestinationRulesubsets and
VirtualServiceweights, using metrics to drive rollbacks.
We also consider replacing the Redis polling mechanism with an xDS‑style gRPC stream so the gateway receives configuration updates instantly.
Thank you for reading; we look forward to your feedback.
References
CoreDNS issue 2324
Link 2
kuberesolver
nginx gRPC module
gRPC metadata handling
Jike Tech Team
Article sharing by the Jike Tech Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.