Mastering High‑Performance, High‑Concurrency, High‑Availability Backend Systems

This article shares a backend engineer's practical methodology for building systems that simultaneously achieve high performance, high concurrency, and high availability, covering performance optimization, scaling strategies, fault‑tolerance techniques, and real‑world case studies from B‑ and C‑side logistics platforms.

dbaplus Community
dbaplus Community
dbaplus Community
Mastering High‑Performance, High‑Concurrency, High‑Availability Backend Systems

Overview

The article presents a practical methodology for building backend systems that satisfy the three‑high requirements—high performance, high concurrency, and high availability. It distinguishes between technical complexity (typical of C‑end services) and business complexity (typical of B‑end or M‑end services) and shows how to address both through systematic design, scaling, and fault‑tolerance techniques.

High‑Performance

Methodology

Performance is limited by three factors: computation, communication, and storage. Typical bottlenecks include heavy GC pauses, slow downstream services, large tables, and inefficient SQL. Optimisation should be approached from both read and write perspectives.

Read Optimisation – Cache + Database

Read‑heavy scenario : Write synchronously to the database, then invalidate or delete the cache entry. The database handles writes; the cache serves reads, providing low‑latency access.

Write‑heavy scenario : Update the cache synchronously, propagate the change to the database asynchronously. The cache absorbs write traffic, while the database is eventually consistent. This pattern is used for JD Logistics order‑relation documents.

Typical workflow:

// Pseudo‑code for read‑heavy flow
function getItem(id) {
    let value = cache.get(id);
    if (value == null) {
        value = db.query("SELECT * FROM table WHERE id=?", id);
        cache.set(id, value);
    }
    return value;
}

// Write‑heavy flow (async DB update)
function updateItem(id, data) {
    cache.set(id, data); // fast response
    asyncQueue.push(() => db.update(id, data));
}

Write Optimisation – Asynchronous Processing for Flash‑Sale (秒杀) Scenarios

During traffic spikes, the order‑taking API must return instantly. The request is accepted, the order ID is returned, and the actual order creation is delegated to a message queue. Stock is cached; after a successful stock deduction, an SMS is sent to the user.

Write optimisation diagram
Write optimisation diagram

High‑Concurrency

Methodology

Horizontal scaling – add machines and database shards. Common during large promotions (e.g., 618, Double‑11).

Vertical scaling – split databases (sharding, read/write separation) to increase connection limits.

Depth scaling (regional isolation) – deploy independent service units per geographic region (e.g., a Beijing unit serves Beijing users) to reduce latency and avoid single‑point bottlenecks.

Depth scaling diagram
Depth scaling diagram

Practical Application – DDD in Retail Logistics

Domain‑Driven Design (DDD) is applied after a deep understanding of business processes. Core domains include:

Product Service

Order

Payment & Settlement

Fulfilment

Both forward (merchant places order → logistics assigns courier → delivery) and reverse (user requests after‑sale return) flows are modelled. The resulting micro‑service decomposition follows the DDD bounded contexts, enabling independent scaling and clearer ownership.

Business process diagram
Business process diagram

High‑Availability

Application Layer

Rate limiting – protects services from traffic spikes. Common algorithms:

Counter – simple but not smooth.

Sliding window – time‑controlled, may drop excess traffic.

Leaky bucket – smooths bursts.

Token bucket – dynamic token size for adaptive control.

Circuit breaking & degradation – prevents downstream failures from cascading. Circuit breakers stop calls to unhealthy services; degradation provides graceful fall‑backs (manual switch via configuration centre or automatic via Hystrix).

Timeout funnel – set progressively shorter timeouts from upstream to downstream services to avoid upstream thread exhaustion.

Timeout funnel diagram
Timeout funnel diagram

Retry strategy – limit retry count and ensure idempotency. Excessive retries cause storm effects, especially in long call chains.

Retry storm illustration
Retry storm illustration

Isolation – multiple dimensions:

System‑level (different services for different business levels).

Environment (dev, test, pre‑prod, prod).

Data (tenant‑based, schema‑based, or separate databases).

Core vs. non‑core flow (critical services receive more resources).

Read/write (CQRS, master‑slave DB).

Thread‑pool isolation.

Storage Layer

Replication types

Master‑slave – writes go to master; reads can be served by slaves.

Multi‑master – any master can accept writes; changes are propagated.

Leaderless – writes to any node; reads from multiple nodes with conflict resolution.

Sharding (partitioning) – each record belongs to exactly one partition. Methods include range‑based and hash‑based.

Redis Cluster – 16,384 slots; keys are hashed to slots, which are assigned to shards with master‑slave pairs. If a master fails, a slave is promoted.

Elasticsearch index – fixed number of primary shards; replica count is configurable. Primary shards handle reads/writes; replicas provide read‑only copies for redundancy.

Kafka topic – each topic is partitioned; each partition has a leader (read/write) and followers (replicas) for fault tolerance.

Deployment Layer

Deployment evolves from single‑machine to multi‑region, using redundancy and load balancing for disaster recovery.

Deployment evolution diagram
Deployment evolution diagram

Current production uses Docker containers across multiple data‑centers. Service groups are separated by business importance and traffic volume. Databases and Redis are deployed in dual‑data‑center setups; Elasticsearch runs in a single data‑center.

Current deployment topology
Current deployment topology

Conclusion

Building a “three‑high” backend system requires a balanced focus on performance, concurrency, and availability. Key techniques include:

Cache‑first read paths and asynchronous write‑back for write‑heavy workloads.

Message‑queue‑driven order processing for flash‑sale spikes.

Horizontal, vertical, and depth scaling combined with DDD‑driven micro‑service boundaries.

Rate‑limiting, circuit breaking, timeout funnels, bounded retries, and multi‑layer isolation for resilience.

Replication and sharding (MySQL, Redis, Elasticsearch, Kafka) to avoid single points of failure.

Multi‑region containerised deployment with redundant data‑centers.

Applying these patterns enables backend platforms to serve both B‑end and C‑end workloads at massive scale while maintaining low latency (TP99/TP999) and high availability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilitySystem Designcachinghigh concurrencyDDDhigh performance
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.