How Health160 Scaled to Millions: Real-World Backend Performance Optimization Strategies
This article shares Health160's systematic approach to building a high‑performance, high‑availability medical service platform, covering monitoring, metric design, flow‑control, idempotency, unknown data handling, optimization case studies, architectural choices, NIO networking, middleware tuning, and caching techniques.
Health160 is a leading Chinese internet hospital platform with over a hundred million users and five hundred million service records. The team focuses on building a high‑performance, high‑availability, high‑concurrency medical service system that integrates with many hospital systems to provide real‑time appointment services.
Monitoring and Metrics
A robust monitoring system is essential for visualizing the full request chain and detecting performance issues early. Health160 uses SkyWalking APM with unique TraceIds to trace calls. Monitoring data helps identify hidden problems such as slow interfaces and rising P99 latency.
Technical metrics are defined as admission standards for code deployment. They guide problem discovery, solution design, and pressure testing. All code must meet these standards before production.
Technical Thinking
1. Boundary Between Mid‑Platform and Business Interfaces
Interfaces called by other services become mid‑platform interfaces, which have stricter performance requirements than pure business interfaces. Misclassifying an interface can lead to performance bottlenecks.
2. Flow Control
Both inbound (gateway, Sentinel) and outbound flow control are needed. Outbound services such as MySQL, Redis, and external APIs must also be throttled or degraded when overloaded.
3. Idempotency
Idempotent operations ensure that repeated calls with the same parameters produce the same result, preventing duplicate processing. Solutions include unique primary keys, optimistic locks, anti‑repeat tokens, and downstream sequence numbers.
4. Uncertain Data Volume Collections
When fetching an unknown amount of data, batch retrieval and pagination with ID ranges should be used to avoid hidden performance issues.
Optimization Cases
Case 1: Redis configuration (bgsave) caused frequent pauses. After tuning, P99 latency dropped to 215 ms, a three‑fold improvement.
Case 2: Redundant A/B calls on the homepage were removed, improving call chain efficiency.
Case 3: Video and science tabs' P95 latency under high load was optimized by adjusting request parameters, doubling performance.
Case 4: Institution list performance was boosted by adding caching, achieving a 30 % speed increase.
System Architecture
The platform follows a layered architecture: static resources are served via Vue + CDN (front‑end), NIO networking model provides high concurrency, Kafka decouples traffic and smooths spikes, and middleware (Redis, MySQL, Elasticsearch) has admission standards and flow‑control mechanisms.
NIO Network Model
Thread count (800‑1000) is a critical parameter; setting it too high or low can degrade performance.
Application Side (Java)
Key practices include proper connection pool configuration, closing connections, handling reconnections, managing slow queries, and tuning JVM parameters (‑Xms, ‑Xmx, GC settings).
Middleware
Three focus areas: benchmark testing, caching, and slow‑query optimization. Elasticsearch is deployed with multiple coordinating and master nodes to ensure high availability and supports up to 20 k QPS.
Process Optimization
Complex systems require careful process refactoring with gray‑release strategies to avoid production impact. Prioritize path‑node bottlenecks, then process improvements, and finally caching.
Caching Mechanisms
Redis caching reduces network and disk I/O, while in‑memory caching (Caffeine) provides fast access for single‑instance services. A CDC‑based approach updates Redis without code changes.
Conclusion
The article outlines a systematic framework for performance improvement at Health160, emphasizing continuous monitoring, metric‑driven development, thoughtful architecture, and disciplined optimization to maintain a high‑availability, high‑throughput medical platform.
160 Technical Team
Digital medical technology takes flight with algorithm accelerators driving progress. Code creates a new medical ecosystem, and health data co-creates a brilliant future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.