Mastering High Concurrency & High Availability: Core Principles for Scalable Systems
This article outlines essential principles for designing high‑concurrency and high‑availability systems, covering stateless architecture, service decomposition, caching strategies, message queues, data heterogeneity, degradation, rate limiting, traffic switching, rollback, and comprehensive business design rules such as idempotency, anti‑duplication, and documentation.
1. High Concurrency Principles
1.1 Stateless
If the application is designed to be stateless, it is easier to scale horizontally. In practice, the application itself is stateless while configuration files are stateful.
1.2 Splitting
When traffic is large and resources are sufficient, consider splitting. Main splitting scenarios include:
System dimension: split by system function/business.
Function dimension: split a system by its functions.
Read‑write dimension: split based on read/write characteristics; use cache for heavy reads, sharding for heavy writes, and heterogeneous data splitting for aggregation.
AOP dimension: split according to access characteristics using AOP.
Module dimension: split based on foundational or code‑maintenance characteristics.
1.3 Serviceization
In‑process service → single‑machine remote service → cluster manual registration → automatic registration and discovery → service grouping/isolation/routing → service governance (rate limiting, black/white lists).
1.4 Message Queues
Message queues decouple services that do not require synchronous calls, enable one‑to‑many consumption, asynchronous processing, and traffic shaping/buffering.
1.5 Data Heterogeneity
1.5.1 Data Heterogeneity
Order tables are often sharded by order ID; querying a user's orders requires aggregating multiple tables, leading to low read performance. To improve this, create a heterogeneous user‑order table sharded by user ID.
Additionally, archiving order data can enhance performance and stability.
1.5.2 Data Closed Loop
For pages like product details with many data sources, store used data heterogeneously to form a closed loop. Steps:
Data heterogeneity: receive data changes via MQ and atomically store them in suitable storage such as Redis or persistent KV stores.
Data aggregation: aggregate data from multiple sources, typically stored in KV for front‑end single‑call retrieval.
Front‑end presentation: front‑end obtains required data with one or few calls.
This approach ensures that even if dependent systems fail, the front‑end can still display data, though updates may be delayed.
When multiple data items are needed, use a HashTag mechanism to co‑locate related data in the same instance, e.g., using productId as a shard key for both basic info and specification data.
1.6 Cache “Silver Bullet”
Browser cache
App client cache
CDN cache
Edge layer cache
Application layer cache
Distributed cache
For fallback or abnormal data, caching should be avoided to prevent stale data from being shown to users for extended periods.
1.7 Concurrency
Parallelize serial behavior.
2. High Availability Principles
2.1 Degradation
Design a degradation switch with the following ideas:
Centralized management of switches via push mechanisms.
Multi‑level read service degradation: local cache, distributed cache, default degraded data (e.g., assume inventory is in stock).
Place switches at the ingress layer (e.g., Nginx) to route traffic selectively.
Business degradation: during traffic spikes, prioritize order placement and payment while ensuring eventual data consistency, possibly converting synchronous calls to asynchronous.
2.2 Rate Limiting
Purpose: prevent malicious traffic, attacks, or traffic exceeding system peaks.
Direct malicious requests to cache only.
Use Nginx limit module for traffic reaching backend.
Block malicious IPs with Nginx deny.
The principle is to limit traffic from reaching vulnerable application layers.
2.3 Traffic Switching
For large applications, traffic switching is vital when a data center, rack, or server fails. Methods include:
DNS
HttpDNS
LVS/HaProxy
Nginx
2.4 Rollback
Versioning enables auditability, traceability, and rollback. Errors can be recovered by rolling back code, deployment, data, or static resources, ensuring high availability in certain scenarios.
3. Business Design Principles
3.1 Idempotent Design
An idempotent operation yields the same effect regardless of how many times it is executed with the same parameters.
3.2 Anti‑Duplication Design
Prevent duplicate payments, duplicate deductions, etc.
3.3 Process Definition
Reuse workflow systems to provide customizable process services.
3.4 State and State Machine
Transaction order systems have forward states (awaiting payment, awaiting shipment, shipped, completed) and reverse states (cancellation, refund). State design should include traceability for user tracking and logging, enabling issue backtracking.
3.5 Backend Operation Feedback
Design backend systems with preview and feedback capabilities.
3.6 Backend Approval Flow
Important backend functions (e.g., price adjustments) should have approval workflows and log operations for traceability and audit.
3.7 Documentation and Comments
Early‑stage systems should maintain documentation libraries (architecture, design ideas, data dictionary, business processes, known issues) and code should include comments for special requirements.
3.8 Backup
Backup both code and personnel. Code should be stored in repositories with versioning; at least two developers should understand each system.
4. Summary
System design must not only implement business functionality but also ensure high concurrency, high availability, and high reliability. It should consider capacity planning, SLA definition, monitoring and alerting, and emergency plans such as disaster recovery, degradation, rate limiting, isolation, traffic switching, and rollback.
Key high‑concurrency tactics include caching, asynchronous processing, connection pools, thread pools, scaling, message queues, and distributed tasks. High‑availability tactics include load balancing, reverse proxy traffic splitting, rate limiting, degradation, isolation, timeout/retry settings, and rollback mechanisms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
