Backend Development 9 min read

Four Common API Rate Limiting Strategies and Their Implementation at Stripe

This article explains why availability and reliability are essential for web APIs, outlines four common rate‑limiting approaches used by Stripe—including request, concurrent, usage‑based, and worker‑utilization limiters—and provides practical guidance on implementing token‑bucket limiters with Redis while ensuring safe error handling and gradual rollout.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Four Common API Rate Limiting Strategies and Their Implementation at Stripe

Availability and reliability are crucial for web applications and API services; sudden traffic spikes can degrade quality or cause outages.

Rate limiting helps keep APIs reliable in scenarios such as abusive users, malicious attacks, low‑priority bulk requests, or internal errors that generate excess load.

Rate Limiters and Load Shedding

A rate limiter controls the request rate. It is suitable when clients can tolerate slower request pacing; otherwise other strategies are needed. Load shedding discards low‑priority requests based on system state to protect critical traffic.

Different Types of Rate Limiters Used at Stripe

Request Rate Limiter

Limits each user to N requests per second, shares behavior between test and production, and can be tuned for traffic spikes such as flash sales.

Concurrent Request Limiter

Limits the maximum number of concurrent requests, helping to avoid resource contention caused by retries.

Usage‑Based Load Shedding

Separates traffic into critical (e.g., order creation) and non‑critical (e.g., listing history) requests, reserving a portion of capacity for critical work and rejecting excess non‑critical requests with HTTP 503.

Worker‑Utilization Load Shedding

Monitors worker thread/coroutine utilization; when workers are saturated, non‑critical traffic (starting with test requests) is gradually shed and later restored as capacity recovers.

Implementing Rate Limiters

Stripe implements the token‑bucket algorithm using Redis (self‑hosted or via managed services like AWS ElastiCache). Key implementation guidelines include safely inserting the limiter into middleware, providing clear error responses (HTTP 429/503), having an emergency kill‑switch, and rolling out limits gradually while monitoring metrics.

Conclusion

Rate limiting is one of the most effective ways to achieve horizontal scalability for APIs. Introduce a request limiter first, then add concurrency, usage‑based, and worker‑utilization limiters as needed, following best practices for deployment, error handling, and observability.

For further details, see Stripe’s English blog post and the accompanying GitHub gist.

BackendRedisAPIRate Limitingtoken bucketLoad SheddingStripe
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.