Backend Development 12 min read

Achieving High Concurrency, High Performance, and High Availability in Backend Systems

This article explains how to design backend architectures that meet the "three high" goals—high concurrency, high performance, and high availability—by using load balancing, connection pooling, traffic filtering, multi‑level caching, log optimization, and failover strategies.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Achieving High Concurrency, High Performance, and High Availability in Backend Systems

High Concurrency

High concurrency is measured by QPS (Queries Per Second) greater than 100,000, and is achieved through cluster deployment and load balancing.

1. Load Balancing

Clustered servers increase total QPS; traffic is forwarded using load balancers such as LVS and Nginx. Common algorithms include round‑robin, random, source‑IP hash, weighted round‑robin, weighted random, and least‑connections.

In a flash‑sale scenario with millions of requests, a single LVS cannot handle the peak, so around ten LVS instances with DDNS and high‑performance NICs are used to achieve millions of concurrent connections.
Note: LVS works at layer‑4 and cannot balance by HTTP path, so Nginx is needed for layer‑7 routing.

2. Pooling Techniques

Creating a new connection for each request incurs TCP handshake overhead; pooling pre‑allocates resources and reuses them. Common pools include thread pools, process pools, object pools, memory pools, connection pools, and coroutine pools.

Connection pool parameters: minimum connections, idle connections, maximum connections.

In Go, a goroutine starts with ~2KB stack (vs 8 MB thread stack) and switches entirely in user space, making it lightweight.

Preemptive coroutine pool: tasks share a common channel, leading to lock contention.

Scheduled coroutine pool: each coroutine has its own channel, and tasks are dispatched using load‑balancing algorithms.

3. Traffic Funnel (Filtering)

Malicious traffic (bots, scrapers, scalpers) must be filtered to reduce load. Strategies include gateway/WAF blocking, IP rate limiting, and behavior‑based risk analysis using big‑data techniques.

Methods: block attacker IPs, reject requests with illegal parameters, apply per‑IP or per‑user‑ID rate limits.

High Performance

Performance directly affects user experience; response times >5 s cause abandonment. Influencing factors include network conditions, payload size, CPU/memory/disk, request chain length, downstream system performance, and algorithm efficiency.

Key optimizations:

High‑Performance Caching : Use multi‑level caches (register, L1‑L3, local memory, distributed cache) to reduce DB load.

Log Optimization : Reduce I/O bottlenecks by using in‑memory tmpfs for logs, batch sequential writes, and avoid excessive disk writes.

Remember to handle cache penetration, cache avalanche, hot‑key, and consistency issues; often a combination of browser, local memory, and distributed caches is used.

High Availability

Availability metrics include MTBF, MTTR, and SLA (SLA = MTBF / (MTBF + MTTR)). An SLA >99.99% is considered high availability.

Strategies:

Multi‑cloud, active‑active, and geographic backup.

Primary‑secondary failover for Redis, MySQL, etc.

Stateless microservices with health checks.

Circuit breaking and rate limiting to protect against overload.

Web security measures against XSS and attacks.

Failover and Recovery

Automatic fault detection, failover, and failback reduce MTTR and improve SLA.

Circuit Breaking and Rate Limiting

Circuit breaking stops service when thresholds (CPU >90%, error rate >5%, latency >500 ms) are exceeded, while rate limiting throttles excess requests.

Degradation

During traffic spikes, non‑core features (e.g., product reviews, order history) can be temporarily disabled to preserve core functions like order creation and payment.

Degradation protects core services by shutting down less critical components.
Performance Optimizationbackend architectureHigh AvailabilityLoad BalancingCachinghigh concurrencyconnection pooling
Full-Stack Internet Architecture
Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.