Operations 24 min read

How to Build a Scalable Rate‑Limiting System with Kong in Cloud‑Native Operations

This article outlines a comprehensive, cloud‑native rate‑limiting solution using Kong gateway, covering background challenges, design considerations, multi‑layer architecture, plugin development, CI/CD workflow, deployment strategies, and operational best practices to achieve low cost, high efficiency, and high quality across diverse projects.

dbaplus Community
dbaplus Community
dbaplus Community
How to Build a Scalable Rate‑Limiting System with Kong in Cloud‑Native Operations

Background

A sudden traffic surge caused service outages. The initial fix using Nginx rate‑limiting was coarse, prompting the need for a proactive, fine‑grained rate‑limiting solution that can be reused across many heterogeneous services (different languages, lifecycles, deployment environments).

Technical Solution Overview

The design follows a three‑layer architecture:

Rate‑limiting implementation layer – Decide where limits are enforced. Application‑level middleware is cheap but still burdens each API service. An access‑layer (gateway) implementation isolates the limit logic and enables reuse.

Kong gateway – Chosen for its lightweight, cloud‑native nature (built on OpenResty/Nginx). Kong is configured via a single YAML file, supports plugins in multiple languages, and scales horizontally.

Kong plugins – Implemented primarily in Go (preferred) with a Lua fallback. Go plugins communicate with Kong via IPC, which adds negligible overhead in the current low‑frequency scenario.

The plugin stack is further divided into three logical modules:

Rate‑limiting algorithm SDK – Provides token‑bucket, fixed‑window, sliding‑window, and concurrency‑control algorithms for short‑lived and long‑lived connections.

Rate‑limiting plugin – Calls the SDK and applies dimension‑based strategies (user, path, method, IP, etc.).

Business plugin – Parses project‑specific request attributes and listens to dynamic configuration changes.

Architecture diagram:

Architecture diagram
Architecture diagram

Data‑flow diagram:

Data flow
Data flow

Rate‑Limiting Implementation Details

Layer choice : Application‑level limiting keeps the API code simple but still subjects the service to burst traffic, couples logic to business code, and is hard to reuse across projects. Access‑layer limiting using a gateway moves the pressure to a dedicated component, isolates responsibilities, and enables a single point of control for all services.

Kong as the gateway : Kong requires only one process and a YAML configuration for deployment. Its plugin system allows custom logic at any request phase, making it suitable for both short‑connection rate limiting and long‑polling connection‑count limiting.

Plugin language selection : Go plugins offer better ecosystem fit, easier engineering, and lower maintenance cost for the team, while Lua plugins run in‑process with higher raw performance. Because Go plugins communicate via IPC only a few times per request, the performance impact is minimal, so Go is the default choice with Lua as a fallback.

Plugin Moduleization

Algorithm SDK – Implemented as a reusable library. Each rate‑limiting scenario (e.g., token bucket for request frequency, sliding window for burst control, semaphore for concurrent connections) gets its own SDK instance. The SDK can also be used by client‑side throttling code when needed.

Rate‑limiting plugin – Instantiates the appropriate SDK based on the request’s dimension keys, reads the limit configuration, and decides whether to allow or reject the request.

Business plugin – Extracts custom dimensions (e.g., user ID, API path, HTTP method, client IP), builds a composite rate‑limit key, and watches a configuration center for dynamic quota adjustments.

Typical request flow:

Request arrives at Kong.

Business plugin parses dimensions and builds a limit key.

The key is passed via Kong’s ngx.ctx (or similar) to the rate‑limiting plugin.

Rate‑limiting plugin invokes the SDK, which may query a distributed store (Redis + Lua script) or an in‑memory counter.

Based on the SDK result, the request is either forwarded to the upstream service or rejected with a 429 status.

Long‑polling connection‑count limiting example: the business plugin records a unique request ID in a Redis ZSET with a TTL equal to the poll timeout. A Lua script atomically removes expired entries and checks the current cardinality before allowing a new poll connection.

Implementation Workflow

Collaboration model – Form a cross‑project professional group. Members from multiple service teams contribute requirements, review the common design, and pilot the solution in a single project before wider rollout.

Requirement gathering – Teams submit issues describing traffic patterns, limiting dimensions, and any special business logic.

Evaluation – Assess development effort, maintenance overhead, hardware cost (Kong nodes + optional Redis), deployment topology, and performance impact.

Development (if needed) – Provide a Docker‑Compose environment for local testing, detailed plugin development guidelines, and sample Go/Lua skeletons.

CI/CD – Tag the plugin repository in GitLab, trigger a pipeline that compiles the plugin, packages it, uploads the artifact to a file service, and creates a GitLab release. Project teams fork the repo, select required plugins, and trigger CI to build a Docker image containing Kong and the selected plugins.

Deployment – Use the internal consistent delivery system to deploy the image to physical machines or cloud platforms. The system automatically attaches monitoring, logging, and alerting.

Testing – Verify rate‑limiting behavior (unit tests, integration tests), ensure original API functionality is unchanged, run traffic‑mirroring tests, and perform failure‑injection drills.

Launch – Follow a rollout checklist that includes gray‑release, health‑check monitoring, and rollback procedures.

CI/CD and Release Process

When a plugin is ready:

Push a GitLab tag (e.g., v1.2.0).

The CI pipeline compiles the Go/Lua code, bundles it into a .so file, creates a .tar.gz release asset, and publishes it on the GitLab release page.

Project teams then:

Fork the plugin repository.

Configure a docker-compose.yml (or Helm chart) that lists the desired plugins and versions.

Trigger the CI pipeline to pull the released artifacts, assemble a Kong Docker image, and push it to the internal registry.

Deployment and Operations

Kong clusters can be scaled horizontally; each instance runs the same set of plugins. For distributed rate limiting, a shared Redis instance (or Redis cluster) is used. The deployment system injects side‑car containers for metrics (Prometheus), tracing (OpenTracing), and log aggregation (ELK/Fluentd). Configuration changes are propagated via a central config service that the business plugin watches.

Testing Strategy

Functional tests ensure the SDK returns correct allowance decisions for each algorithm.

Regression tests confirm that upstream API responses remain unchanged when limits are not exceeded.

Mirror‑traffic tests replay production traffic into a staging environment to detect unexpected throttling.

Chaos engineering drills inject latency or node failures to verify graceful degradation and automatic failover.

Usage Scenarios

The solution supports five typical integration patterns:

New project – adopt Kong and the rate‑limiting plugins directly.

Existing project without a gateway – evaluate business needs, then add Kong as the front‑door.

Existing project already using Kong – simply install the rate‑limiting plugins.

Existing project using a different gateway – bypass Kong and call the algorithm SDK directly from the existing gateway.

Client‑side throttling – use the SDK in the client library without any gateway.

Performance and Cost

Latency impact – Measured additional request latency is ~5 ms per request.

Throughput – Kong + Go plugins have been benchmarked at 50‑60 k QPS on modest hardware.

Hardware cost – Deploy a small Kong cluster (2‑3 replicas) plus a shared Redis instance. Existing Redis can be reused to reduce cost.

Development cost – For standard use‑cases, integration requires no code changes; only configuration of limit parameters.

Benefits

Low cost – Minimal development effort, shared infrastructure, and optional reuse of existing Redis.

High efficiency – Typical onboarding time for a standard project is three days (evaluation, testing, deployment, rollout).

High quality – Professional support from the dedicated team, extensive documentation, automated CI/CD, and built‑in observability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ci/cdMicroservicesOperationsredisrate limitingKong
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.