How to Build a Scalable Rate‑Limiting System with Kong in Cloud‑Native Operations
This article outlines a comprehensive, cloud‑native rate‑limiting solution using Kong gateway, covering background challenges, design considerations, multi‑layer architecture, plugin development, CI/CD workflow, deployment strategies, and operational best practices to achieve low cost, high efficiency, and high quality across diverse projects.
Background
A sudden traffic surge caused service outages. The initial fix using Nginx rate‑limiting was coarse, prompting the need for a proactive, fine‑grained rate‑limiting solution that can be reused across many heterogeneous services (different languages, lifecycles, deployment environments).
Technical Solution Overview
The design follows a three‑layer architecture:
Rate‑limiting implementation layer – Decide where limits are enforced. Application‑level middleware is cheap but still burdens each API service. An access‑layer (gateway) implementation isolates the limit logic and enables reuse.
Kong gateway – Chosen for its lightweight, cloud‑native nature (built on OpenResty/Nginx). Kong is configured via a single YAML file, supports plugins in multiple languages, and scales horizontally.
Kong plugins – Implemented primarily in Go (preferred) with a Lua fallback. Go plugins communicate with Kong via IPC, which adds negligible overhead in the current low‑frequency scenario.
The plugin stack is further divided into three logical modules:
Rate‑limiting algorithm SDK – Provides token‑bucket, fixed‑window, sliding‑window, and concurrency‑control algorithms for short‑lived and long‑lived connections.
Rate‑limiting plugin – Calls the SDK and applies dimension‑based strategies (user, path, method, IP, etc.).
Business plugin – Parses project‑specific request attributes and listens to dynamic configuration changes.
Architecture diagram:
Data‑flow diagram:
Rate‑Limiting Implementation Details
Layer choice : Application‑level limiting keeps the API code simple but still subjects the service to burst traffic, couples logic to business code, and is hard to reuse across projects. Access‑layer limiting using a gateway moves the pressure to a dedicated component, isolates responsibilities, and enables a single point of control for all services.
Kong as the gateway : Kong requires only one process and a YAML configuration for deployment. Its plugin system allows custom logic at any request phase, making it suitable for both short‑connection rate limiting and long‑polling connection‑count limiting.
Plugin language selection : Go plugins offer better ecosystem fit, easier engineering, and lower maintenance cost for the team, while Lua plugins run in‑process with higher raw performance. Because Go plugins communicate via IPC only a few times per request, the performance impact is minimal, so Go is the default choice with Lua as a fallback.
Plugin Moduleization
Algorithm SDK – Implemented as a reusable library. Each rate‑limiting scenario (e.g., token bucket for request frequency, sliding window for burst control, semaphore for concurrent connections) gets its own SDK instance. The SDK can also be used by client‑side throttling code when needed.
Rate‑limiting plugin – Instantiates the appropriate SDK based on the request’s dimension keys, reads the limit configuration, and decides whether to allow or reject the request.
Business plugin – Extracts custom dimensions (e.g., user ID, API path, HTTP method, client IP), builds a composite rate‑limit key, and watches a configuration center for dynamic quota adjustments.
Typical request flow:
Request arrives at Kong.
Business plugin parses dimensions and builds a limit key.
The key is passed via Kong’s ngx.ctx (or similar) to the rate‑limiting plugin.
Rate‑limiting plugin invokes the SDK, which may query a distributed store (Redis + Lua script) or an in‑memory counter.
Based on the SDK result, the request is either forwarded to the upstream service or rejected with a 429 status.
Long‑polling connection‑count limiting example: the business plugin records a unique request ID in a Redis ZSET with a TTL equal to the poll timeout. A Lua script atomically removes expired entries and checks the current cardinality before allowing a new poll connection.
Implementation Workflow
Collaboration model – Form a cross‑project professional group. Members from multiple service teams contribute requirements, review the common design, and pilot the solution in a single project before wider rollout.
Requirement gathering – Teams submit issues describing traffic patterns, limiting dimensions, and any special business logic.
Evaluation – Assess development effort, maintenance overhead, hardware cost (Kong nodes + optional Redis), deployment topology, and performance impact.
Development (if needed) – Provide a Docker‑Compose environment for local testing, detailed plugin development guidelines, and sample Go/Lua skeletons.
CI/CD – Tag the plugin repository in GitLab, trigger a pipeline that compiles the plugin, packages it, uploads the artifact to a file service, and creates a GitLab release. Project teams fork the repo, select required plugins, and trigger CI to build a Docker image containing Kong and the selected plugins.
Deployment – Use the internal consistent delivery system to deploy the image to physical machines or cloud platforms. The system automatically attaches monitoring, logging, and alerting.
Testing – Verify rate‑limiting behavior (unit tests, integration tests), ensure original API functionality is unchanged, run traffic‑mirroring tests, and perform failure‑injection drills.
Launch – Follow a rollout checklist that includes gray‑release, health‑check monitoring, and rollback procedures.
CI/CD and Release Process
When a plugin is ready:
Push a GitLab tag (e.g., v1.2.0).
The CI pipeline compiles the Go/Lua code, bundles it into a .so file, creates a .tar.gz release asset, and publishes it on the GitLab release page.
Project teams then:
Fork the plugin repository.
Configure a docker-compose.yml (or Helm chart) that lists the desired plugins and versions.
Trigger the CI pipeline to pull the released artifacts, assemble a Kong Docker image, and push it to the internal registry.
Deployment and Operations
Kong clusters can be scaled horizontally; each instance runs the same set of plugins. For distributed rate limiting, a shared Redis instance (or Redis cluster) is used. The deployment system injects side‑car containers for metrics (Prometheus), tracing (OpenTracing), and log aggregation (ELK/Fluentd). Configuration changes are propagated via a central config service that the business plugin watches.
Testing Strategy
Functional tests ensure the SDK returns correct allowance decisions for each algorithm.
Regression tests confirm that upstream API responses remain unchanged when limits are not exceeded.
Mirror‑traffic tests replay production traffic into a staging environment to detect unexpected throttling.
Chaos engineering drills inject latency or node failures to verify graceful degradation and automatic failover.
Usage Scenarios
The solution supports five typical integration patterns:
New project – adopt Kong and the rate‑limiting plugins directly.
Existing project without a gateway – evaluate business needs, then add Kong as the front‑door.
Existing project already using Kong – simply install the rate‑limiting plugins.
Existing project using a different gateway – bypass Kong and call the algorithm SDK directly from the existing gateway.
Client‑side throttling – use the SDK in the client library without any gateway.
Performance and Cost
Latency impact – Measured additional request latency is ~5 ms per request.
Throughput – Kong + Go plugins have been benchmarked at 50‑60 k QPS on modest hardware.
Hardware cost – Deploy a small Kong cluster (2‑3 replicas) plus a shared Redis instance. Existing Redis can be reused to reduce cost.
Development cost – For standard use‑cases, integration requires no code changes; only configuration of limit parameters.
Benefits
Low cost – Minimal development effort, shared infrastructure, and optional reuse of existing Redis.
High efficiency – Typical onboarding time for a standard project is three days (evaluation, testing, deployment, rollout).
High quality – Professional support from the dedicated team, extensive documentation, automated CI/CD, and built‑in observability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
