Backend Development 6 min read

Designing High‑Concurrency Microservice Architectures: Splitting, Sharding, Rate Limiting, and Circuit Breaking

This guide explains how to build a million‑request‑per‑second microservice system by properly splitting business domains, partitioning data with vertical and horizontal sharding, applying robust rate‑limiting techniques, and implementing circuit‑breaking and degradation strategies to maintain stability.

Mike Chen's Internet Architecture

Nov 13, 2025

Designing High‑Concurrency Microservice Architectures: Splitting, Sharding, Rate Limiting, and Circuit Breaking

Microservice Splitting

Break the business into appropriately sized microservices, preferably stateless or with externalized state. Service boundaries should follow business domains and data autonomy, using Domain‑Driven Design (DDD) bounded contexts to keep responsibilities clear, coupling low, and independent scaling possible.

Data Partitioning

Two main approaches are used:

Vertical partitioning : separate databases for distinct functions such as logs, audit, and analytics.

Horizontal sharding : split large tables by a key (e.g., user_id, order_id, region, time) across many databases and tables.

Example of hash‑based sharding:

shard = hash(user_id) % N
db = dbs[shard]
table = "orders_" + (user_id % M)
db.execute("INSERT INTO " + table + " (...) VALUES (...)")

Common sharding keys include user_id, order_id, geographic region, and time (monthly or daily). Routing strategies can be range, hash (including consistent hashing), or a hybrid of both.

Million‑concurrency microservice architecture diagram

Service Rate Limiting

Rate limiting protects the system from traffic exceeding its capacity. Common algorithms are:

Token Bucket – allows bursts while smoothing long‑term rate.

Leaky Bucket – smooths output rate and caps peaks.

Fixed Window / Sliding Window – simple counters; sliding windows provide smoother control.

Distributed implementations often use Redis with Lua scripts for atomic operations or libraries such as Guava RateLimiter for single‑node limits.

-- keys: bucket_key, timestamp_key
-- ARGV: rate, capacity, now
-- return whether the request is allowed

Service Circuit Breaking

When downstream services fail or latency spikes, a circuit breaker quickly cuts off calls to prevent cascading failures and gives the downstream service time to recover.

State machine: CLOSED (normal calls), OPEN (short‑circuit), HALF‑OPEN (testing).

Trigger conditions: failure‑rate threshold (e.g., 50%) with a minimum request count (e.g., 20 per window) or latency threshold.

Recovery: after a wait period, move to HALF‑OPEN, allow a few test requests; if they succeed, close the circuit.

Service Degradation

If certain functionalities become unavailable or overly delayed, provide degraded responses to keep the core system usable. Strategies include:

Feature degradation : disable non‑critical features such as recommendations, statistics, or logging.

Data degradation : return cached or stale data, accepting eventual consistency.

Response degradation : return default values, simplified pages, or friendly messages like “service temporarily unavailable”.

Rate‑based degradation : throttle low‑priority users or delay their requests.

Example: when the inventory service is down, return the cached order status with a “inventory processing” notice instead of a 500 error.

These techniques together enable a microservice architecture to sustain million‑level concurrent traffic while maintaining reliability and user experience.

sharding High Concurrency Circuit Breaking Service Splitting

Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.