Backend Development 12 min read

Understanding Rate Limiting Algorithms: Counter, Sliding Window, Token Bucket, and Leaky Bucket

This article explains the purpose, scenarios, and detailed implementations of common rate‑limiting algorithms—including counter (fixed window), sliding window, token bucket, and leaky bucket—highlighting their advantages, drawbacks, and sample Python code for each.

Byte Quality Assurance Team
Byte Quality Assurance Team
Byte Quality Assurance Team
Understanding Rate Limiting Algorithms: Counter, Sliding Window, Token Bucket, and Leaky Bucket

Rate limiting algorithms control system traffic according to predefined rules, applying to request QPS, byte streams, or any abstract flow, and are essential for maintaining service stability.

Why limit? Services have finite capacity; limiting protects servers during normal growth, promotional spikes, malicious crawlers, or cloud‑provider quota enforcement.

Counter (Fixed Window) Algorithm is the simplest method: a counter tracks requests within a fixed time window and rejects requests once the limit is exceeded. It is easy to implement but suffers from boundary and burst problems.

import time
class Counter(object):
    def __init__(self, limit, interval):
        self._limit = limit            # maximum requests in the window
        self._interval = interval       # window size in seconds
        self._reqCount = 0
        self._timeStamp = int(time.time())
    def allow(self):
        now = int(time.time())
        if now < self._timeStamp + self._interval:
            if self._reqCount < self._limit:
                self._reqCount += 1
                return True
            else:
                return False
        else:
            self._timeStamp = now
            self._reqCount = 1
            return True

Sliding Window Algorithm divides the fixed window into many small sub‑windows, each with its own counter, and slides the window forward, discarding the oldest sub‑window. This provides smoother traffic control and mitigates the burst issue of the simple counter.

Implementation steps: partition time into fine‑grained intervals, maintain a counter per interval, aggregate counters of the current window, and reject requests when the total exceeds the limit.

Token Bucket Algorithm separates the concepts of token generation and request consumption. Tokens are added to a bucket at a steady rate; each request consumes a token. If the bucket is empty, the request is rejected. The bucket allows bursts up to its capacity.

import time
class TokenBucket(object):
    def __init__(self, rate, capacity):
        self._rate = rate                # tokens added per second
        self._capacity = capacity        # maximum tokens in bucket
        self._current_amount = 0
        self._last_consume_time = int(time.time())
    def consume(self, token_amount):
        # add newly generated tokens
        now = int(time.time())
        increment = (now - self._last_consume_time) * self._rate
        self._current_amount = min(self._capacity, self._current_amount + increment)
        if token_amount > self._current_amount:
            return False
        self._current_amount -= token_amount
        self._last_consume_time = now
        return True

Leaky Bucket Algorithm treats incoming requests as water poured into a bucket that leaks at a constant rate. When the bucket is full, additional requests are dropped. It smooths outbound traffic and protects downstream services.

import time
class Leaky(object):
    def __init__(self, capacity, rate, add_num):
        self._container = capacity      # max water level
        self._rate = rate               # leak rate per second
        self._addNum = add_num          # water added per request
        self.water = 0
        self.pretime = int(time.time())
    def allow(self, add_num):
        now = int(time.time())
        # leak water since last check
        leaked = (now - self.pretime) * self._rate
        self.water = max(0, self.water - leaked)
        self.pretime = now
        if self.water + add_num <= self._container:
            self.water += add_num
            return True
        return False

Both token bucket and leaky bucket are widely used in real‑world systems; for example, Nginx’s ngx_http_limit_req_module implements a leaky‑bucket style limiter, and many cloud services employ token buckets to allow short bursts while enforcing average rate limits.

Choosing the right algorithm depends on whether you need to limit traffic entry (token bucket) or traffic exit (leaky bucket), the desired smoothness, and the characteristics of upstream and downstream services.

Backenddistributed systemsRate LimitingSliding Windowtoken bucketleaky bucketcounter algorithm
Byte Quality Assurance Team
Written by

Byte Quality Assurance Team

World-leading audio and video quality assurance team, safeguarding the AV experience of hundreds of millions of users.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.