Why Did Xi’an’s Health‑Code App Crash? A Deep Dive into the Failure
The article analyzes the Xi’an “Yima Tong” health‑code system outage, detailing the symptoms, root‑cause factors such as rate‑limiting gaps, server overload, architectural coupling, and ISP differences, and then offers short‑term, long‑term, design, high‑availability, and testing recommendations to prevent future crashes.
Problem Description
1. Health code page blank after scanning QR code.
2. Some users see 502 Bad Gateway.
3. Nucleic acid test report not displayed.
4. During recovery, China Telecom network can open health code while China Mobile cannot.
Root Cause Analysis
Main Issues
Rate limiting problem: Users repeatedly refresh, increasing load, indicating lack of rate limiting.
Server overload: Peak traffic exceeds server capacity, causing crash.
Architecture problem: Modules tightly coupled, possibly not micro‑service based.
Performance overload: Database or network bottlenecks lead to overload.
Scenario problems: Large data queries monopolize resources; peak‑hour traffic spikes overwhelm database.
Design flaws: No high‑concurrency testing or pressure testing before release.
Other Issues
nginx backend server likely crashed under high concurrency, possibly cache breakdown.
Load balancer overloaded; lack of dynamic DNS caused single‑machine network card saturation.
Different ISP DNS paths caused inconsistent access (Telecom vs. Mobile).
Disaster‑recovery and fault isolation insufficient; SLA >12 hours.
Possible hardware load balancer (F5) failure; missing gateway‑level rate limiting.
Solution Recommendations
Product Suggestions
Isolate business modules with high coupling into independent services.
System Suggestions
Short‑term
Page optimization with friendly waiting messages.
Implement request debouncing and caching (e.g., 24‑hour cache for nucleic‑acid results).
Separate critical rendering data from non‑critical, using aggregated APIs.
Merge requests, reduce concurrent calls.
Compress transferred data to lower latency.
Decouple asynchronous requests.
Long‑term
Business abstraction and module isolation for high cohesion, low coupling.
Simplify data models for fast callbacks.
Interface segregation with single‑responsibility services.
Adopt micro‑frontend architecture for independent deployment.
Component library reuse to shrink project size.
Build a middle‑platform to avoid direct backend calls.
Apply diff algorithm on the presentation layer to avoid unnecessary renders.
Implement crash alerts for rapid response.
System Design Suggestions
Architecture: Move to micro‑services, service mesh, cloud‑native elasticity (K8s), preferably on private cloud.
Middleware: Use TiDB for distributed SQL, Redis Cluster for high‑availability caching.
Tiered Management: Prioritize critical services on better hardware, isolate them from less critical ones.
CDN Caching: Cache static resources to reduce backend load.
Security: Harden data‑center according to banking standards, close unused ports.
Alerting: Monitor availability, disk, CPU, memory with timely alerts.
Network Availability: Monitor multi‑region connectivity.
High‑Availability Design
Service and data redundancy (read/write separation, possible ClickHouse for read‑heavy queries).
Robust load balancing (LVS + nginx + dynamic DNS, or hardware LB like F5).
Hot data caching with consistency strategies.
Graceful degradation and throttling for non‑critical paths.
Asynchronous processing and fast‑fail mechanisms.
DNS‑level load balancing with keepalived.
Testing Suggestions
Add high‑performance automated stress tests before release.
Conduct regular disaster‑recovery drills.
These analyses and recommendations aim to improve the stability and scalability of the “Xi’an Yima Tong” health‑code service.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
