Backend Development 15 min read

How Health160 Scaled to Millions: Real-World Backend Performance Optimization Strategies

This article shares Health160's systematic approach to building a high‑performance, high‑availability medical service platform, covering monitoring, metric design, flow‑control, idempotency, unknown data handling, optimization case studies, architectural choices, NIO networking, middleware tuning, and caching techniques.

160 Technical Team

Dec 14, 2023

How Health160 Scaled to Millions: Real-World Backend Performance Optimization Strategies

Health160 is a leading Chinese internet hospital platform with over a hundred million users and five hundred million service records. The team focuses on building a high‑performance, high‑availability, high‑concurrency medical service system that integrates with many hospital systems to provide real‑time appointment services.

Monitoring and Metrics

A robust monitoring system is essential for visualizing the full request chain and detecting performance issues early. Health160 uses SkyWalking APM with unique TraceIds to trace calls. Monitoring data helps identify hidden problems such as slow interfaces and rising P99 latency.

Technical metrics are defined as admission standards for code deployment. They guide problem discovery, solution design, and pressure testing. All code must meet these standards before production.

Technical Thinking

1. Boundary Between Mid‑Platform and Business Interfaces

Interfaces called by other services become mid‑platform interfaces, which have stricter performance requirements than pure business interfaces. Misclassifying an interface can lead to performance bottlenecks.

2. Flow Control

Both inbound (gateway, Sentinel) and outbound flow control are needed. Outbound services such as MySQL, Redis, and external APIs must also be throttled or degraded when overloaded.

3. Idempotency

Idempotent operations ensure that repeated calls with the same parameters produce the same result, preventing duplicate processing. Solutions include unique primary keys, optimistic locks, anti‑repeat tokens, and downstream sequence numbers.

4. Uncertain Data Volume Collections

When fetching an unknown amount of data, batch retrieval and pagination with ID ranges should be used to avoid hidden performance issues.

Optimization Cases

Case 1: Redis configuration (bgsave) caused frequent pauses. After tuning, P99 latency dropped to 215 ms, a three‑fold improvement.

Case 2: Redundant A/B calls on the homepage were removed, improving call chain efficiency.

Case 3: Video and science tabs' P95 latency under high load was optimized by adjusting request parameters, doubling performance.

Case 4: Institution list performance was boosted by adding caching, achieving a 30 % speed increase.

System Architecture

The platform follows a layered architecture: static resources are served via Vue + CDN (front‑end), NIO networking model provides high concurrency, Kafka decouples traffic and smooths spikes, and middleware (Redis, MySQL, Elasticsearch) has admission standards and flow‑control mechanisms.

NIO Network Model

Thread count (800‑1000) is a critical parameter; setting it too high or low can degrade performance.

Application Side (Java)

Key practices include proper connection pool configuration, closing connections, handling reconnections, managing slow queries, and tuning JVM parameters (‑Xms, ‑Xmx, GC settings).

Middleware

Three focus areas: benchmark testing, caching, and slow‑query optimization. Elasticsearch is deployed with multiple coordinating and master nodes to ensure high availability and supports up to 20 k QPS.

Process Optimization

Complex systems require careful process refactoring with gray‑release strategies to avoid production impact. Prioritize path‑node bottlenecks, then process improvements, and finally caching.

Caching Mechanisms

Redis caching reduces network and disk I/O, while in‑memory caching (Caffeine) provides fast access for single‑instance services. A CDC‑based approach updates Redis without code changes.

Conclusion

The article outlines a systematic framework for performance improvement at Health160, emphasizing continuous monitoring, metric‑driven development, thoughtful architecture, and disciplined optimization to maintain a high‑availability, high‑throughput medical platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Caching

Written by

160 Technical Team

Digital medical technology takes flight with algorithm accelerators driving progress. Code creates a new medical ecosystem, and health data co-creates a brilliant future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.