Backend Development 34 min read

How to Build a High‑Concurrency System: Real‑World Practices from a Delivery Platform

This article shares practical experience on designing and operating a high‑concurrency order‑delivery system, covering infrastructure choices, database scaling techniques such as read‑write separation and sharding, architectural patterns like caching, message queues and service governance, as well as application‑level optimizations including compensation, idempotency, async processing and warm‑up strategies.

dbaplus Community

Nov 29, 2021

How to Build a High‑Concurrency System: Real‑World Practices from a Delivery Platform

1. Introduction

When a delivery platform grows from handling millions to tens of millions of orders per day, peak traffic during lunch and dinner creates massive concurrent request loads—core query services can exceed 200,000 QPS, Redis clusters reach millions of QPS, and databases handle over 100,000 QPS and 20,000 TPS. The following sections summarise the author’s multi‑year experience in stabilising such high‑concurrency systems.

2. Infrastructure

Infrastructure forms the foundation of any high‑throughput system. A stable physical server pool, reliable IDC, and robust deployment model are essential, much like a pyramid’s base. Multi‑active (active‑active) deployments across different data centres distribute traffic, eliminate single‑point failures, and increase overall capacity.

Multi‑active setups can be intra‑city or inter‑city; examples include Alibaba’s unit‑based solution and Ele.me’s multi‑center architecture. By splitting traffic across N data centres, each centre handles only 1/N of the total load, effectively increasing system capacity.

3. Database

Databases are a bottleneck in high‑concurrency scenarios; the goal is to raise their capacity.

3.1 Read‑Write Separation

Most internet services are read‑heavy. A primary‑replica setup (1 master + multiple slaves) directs writes to the master and reads to the slaves, dramatically reducing load on the master. For example, with 10,000 QPS and 1,000 TPS, a 1‑master‑5‑slave configuration lets the master handle only 1,000 TPS while each slave serves about 2,000 QPS.

Advantages: simple implementation, minimal code changes. Drawbacks include master‑slave replication lag (milliseconds to seconds) and limited number of slaves, which restricts horizontal scaling. Moreover, read‑write separation cannot increase write TPS.

3.2 Sharding (Database Partitioning)

When read‑write separation is insufficient, sharding splits a single logical table into many physical tables across multiple databases. Sharding can be vertical (splitting by business domain, e.g., user vs. payment tables) or horizontal (splitting rows by a key such as user ID). Images illustrate vertical and horizontal splits.

Sharding solves both QPS and TPS limits and enables theoretically unlimited horizontal scaling. However, it introduces higher migration cost, transaction complexity (distributed transactions are needed), and challenges with multi‑dimensional queries. Common solutions include a global index table or a NoSQL store (e.g., Elasticsearch + MySQL) to map secondary dimensions to the primary sharding key.

Data migration can be performed via stop‑the‑world cut‑over (fast but disruptive) or dual‑write (non‑disruptive but more complex) strategies.

4. Architecture

Beyond databases, architectural patterns such as caching, message queues, service governance, and resource isolation are crucial for handling spikes.

4.1 Caching

Caching protects backend stores from traffic bursts and improves latency. Two main types are local caches (Guava, Ehcache) and distributed caches (Redis, Memcached). The author uses a write‑through pattern with a distributed lock on the order ID:

lock(运单ID) {</code>
<code>    // 删除缓存</code>
<code>    deleteCache();</code>
<code>    // 更新DB</code>
<code>    updateDB();</code>
<code>    // 重建缓存</code>
<code>    reloadCache();</code>
<code>}

This ensures cache consistency while avoiding stale reads.

4.2 Message Queues

MQs (Kafka, RocketMQ, Pulsar, etc.) decouple services, smooth traffic peaks, and provide asynchronous processing. Introducing MQ adds complexity—ordering, latency, and loss must be handled—so use it only after evaluating business needs.

4.3 Service Governance

Governance includes service registration/discovery, observability (logging, tracing, metrics), timeout settings, circuit breaking, rate limiting, and degradation. Proper timeout configuration prevents thread pool exhaustion when downstream services fail. Monitoring provides visibility into CPU, memory, JVM, and business‑level metrics. Circuit breakers protect upstream services from downstream failures, while degradation (active/passive) gracefully reduces functionality under stress.

4.4 Resource Isolation

Isolating resources at the server, middleware, or thread‑pool level prevents noisy‑neighbor problems. The author split clusters into critical, secondary, and non‑critical groups, ensuring high‑priority traffic is not affected by less important services.

5. Application‑Level Optimisations

5.1 Compensation

In micro‑service environments, failures are inevitable. Compensation can be implemented via scheduled jobs (relying on persistent DB state) or delayed MQ messages. Critical paths favour DB‑backed compensation; non‑critical paths can use lightweight MQ‑based retries.

5.2 Idempotency

Idempotent APIs guarantee that repeated requests have the same effect as a single request. Implementations typically store a unique business ID or token and skip processing if the ID already exists.

5.3 Asynchronous Processing

Beyond MQ, internal thread pools or coroutines can offload work. The author’s order‑creation flow stores the request in the DB, returns success to the client, and processes the rest asynchronously in a thread pool, with a periodic compensation job handling failures.

Monitoring thread‑pool metrics (active threads, queue length) is essential to detect overload early.

5.4 Warm‑Up

When a system sits idle for long periods, a sudden traffic surge can overwhelm it. Warm‑up techniques (JVM warm‑up, cache pre‑loading, DB pre‑loading) gradually increase load to reach peak capacity safely. For example, pre‑loading hot product data into Redis before a sales event can improve response times by over 50%.

6. Summary

High‑concurrency system design requires a balanced focus on infrastructure, database scaling, architectural patterns, and application‑level safeguards. Simpler, well‑monitored solutions win over overly complex ones. Follow the KISS principle, avoid premature optimisation, and maintain strict coding and change‑management standards to keep the system maintainable over time.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Caching High concurrency database sharding idempotency Service Governance

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.