Achieving High Concurrency, High Performance, and High Availability in Backend Systems
This article explains how to design backend architectures that meet the "three high" goals—high concurrency, high performance, and high availability—by using load balancing, connection pooling, traffic filtering, multi‑level caching, log optimization, and failover strategies.
High Concurrency
High concurrency is measured by QPS (Queries Per Second) greater than 100,000, and is achieved through cluster deployment and load balancing.
1. Load Balancing
Clustered servers increase total QPS; traffic is forwarded using load balancers such as LVS and Nginx. Common algorithms include round‑robin, random, source‑IP hash, weighted round‑robin, weighted random, and least‑connections.
In a flash‑sale scenario with millions of requests, a single LVS cannot handle the peak, so around ten LVS instances with DDNS and high‑performance NICs are used to achieve millions of concurrent connections.
Note: LVS works at layer‑4 and cannot balance by HTTP path, so Nginx is needed for layer‑7 routing.2. Pooling Techniques
Creating a new connection for each request incurs TCP handshake overhead; pooling pre‑allocates resources and reuses them. Common pools include thread pools, process pools, object pools, memory pools, connection pools, and coroutine pools.
Connection pool parameters: minimum connections, idle connections, maximum connections.
In Go, a goroutine starts with ~2KB stack (vs 8 MB thread stack) and switches entirely in user space, making it lightweight.
Preemptive coroutine pool: tasks share a common channel, leading to lock contention.
Scheduled coroutine pool: each coroutine has its own channel, and tasks are dispatched using load‑balancing algorithms.
3. Traffic Funnel (Filtering)
Malicious traffic (bots, scrapers, scalpers) must be filtered to reduce load. Strategies include gateway/WAF blocking, IP rate limiting, and behavior‑based risk analysis using big‑data techniques.
Methods: block attacker IPs, reject requests with illegal parameters, apply per‑IP or per‑user‑ID rate limits.
High Performance
Performance directly affects user experience; response times >5 s cause abandonment. Influencing factors include network conditions, payload size, CPU/memory/disk, request chain length, downstream system performance, and algorithm efficiency.
Key optimizations:
High‑Performance Caching : Use multi‑level caches (register, L1‑L3, local memory, distributed cache) to reduce DB load.
Log Optimization : Reduce I/O bottlenecks by using in‑memory tmpfs for logs, batch sequential writes, and avoid excessive disk writes.
Remember to handle cache penetration, cache avalanche, hot‑key, and consistency issues; often a combination of browser, local memory, and distributed caches is used.High Availability
Availability metrics include MTBF, MTTR, and SLA (SLA = MTBF / (MTBF + MTTR)). An SLA >99.99% is considered high availability.
Strategies:
Multi‑cloud, active‑active, and geographic backup.
Primary‑secondary failover for Redis, MySQL, etc.
Stateless microservices with health checks.
Circuit breaking and rate limiting to protect against overload.
Web security measures against XSS and attacks.
Failover and Recovery
Automatic fault detection, failover, and failback reduce MTTR and improve SLA.
Circuit Breaking and Rate Limiting
Circuit breaking stops service when thresholds (CPU >90%, error rate >5%, latency >500 ms) are exceeded, while rate limiting throttles excess requests.
Degradation
During traffic spikes, non‑core features (e.g., product reviews, order history) can be temporarily disabled to preserve core functions like order creation and payment.
Degradation protects core services by shutting down less critical components.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.