Designing High‑Concurrency Microservice Architectures: Splitting, Sharding, Rate Limiting, and Circuit Breaking
This guide explains how to build a million‑request‑per‑second microservice system by properly splitting business domains, partitioning data with vertical and horizontal sharding, applying robust rate‑limiting techniques, and implementing circuit‑breaking and degradation strategies to maintain stability.
Microservice Splitting
Break the business into appropriately sized microservices, preferably stateless or with externalized state. Service boundaries should follow business domains and data autonomy, using Domain‑Driven Design (DDD) bounded contexts to keep responsibilities clear, coupling low, and independent scaling possible.
Data Partitioning
Two main approaches are used:
Vertical partitioning : separate databases for distinct functions such as logs, audit, and analytics.
Horizontal sharding : split large tables by a key (e.g., user_id, order_id, region, time) across many databases and tables.
Example of hash‑based sharding:
shard = hash(user_id) % N
db = dbs[shard]
table = "orders_" + (user_id % M)
db.execute("INSERT INTO " + table + " (...) VALUES (...)")Common sharding keys include user_id, order_id, geographic region, and time (monthly or daily). Routing strategies can be range, hash (including consistent hashing), or a hybrid of both.
Service Rate Limiting
Rate limiting protects the system from traffic exceeding its capacity. Common algorithms are:
Token Bucket – allows bursts while smoothing long‑term rate.
Leaky Bucket – smooths output rate and caps peaks.
Fixed Window / Sliding Window – simple counters; sliding windows provide smoother control.
Distributed implementations often use Redis with Lua scripts for atomic operations or libraries such as Guava RateLimiter for single‑node limits.
-- keys: bucket_key, timestamp_key
-- ARGV: rate, capacity, now
-- return whether the request is allowedService Circuit Breaking
When downstream services fail or latency spikes, a circuit breaker quickly cuts off calls to prevent cascading failures and gives the downstream service time to recover.
State machine: CLOSED (normal calls), OPEN (short‑circuit), HALF‑OPEN (testing).
Trigger conditions: failure‑rate threshold (e.g., 50%) with a minimum request count (e.g., 20 per window) or latency threshold.
Recovery: after a wait period, move to HALF‑OPEN, allow a few test requests; if they succeed, close the circuit.
Service Degradation
If certain functionalities become unavailable or overly delayed, provide degraded responses to keep the core system usable. Strategies include:
Feature degradation : disable non‑critical features such as recommendations, statistics, or logging.
Data degradation : return cached or stale data, accepting eventual consistency.
Response degradation : return default values, simplified pages, or friendly messages like “service temporarily unavailable”.
Rate‑based degradation : throttle low‑priority users or delay their requests.
Example: when the inventory service is down, return the cached order status with a “inventory processing” notice instead of a 500 error.
These techniques together enable a microservice architecture to sustain million‑level concurrent traffic while maintaining reliability and user experience.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
