Analyzing Latency and Slow Interface Detection in a Full‑Chain Monitoring System
This article explains how latency is used as a key indicator for application risk identification, defines slow interfaces, describes why percentile‑based thresholds are preferred over averages, and outlines the architecture, task workflow, and practical optimization strategies for a full‑chain monitoring system in a microservice environment.
Good Doctor Online has transitioned to a microservice architecture, leading to increased complexity in inter‑service calls, fault localization, and technical debt. To address these challenges, the infrastructure team built a full‑chain monitoring system covering link diagnosis, capacity monitoring, runtime health profiling, real‑time alerts, and risk assessment models.
Following Google SRE principles, four golden SLI metrics—latency, traffic, errors, and saturation—are used. This article focuses on the latency metric and its role in application risk identification.
Latency reflects the peak QPS/TPS of an application instance; faster responses enable higher concurrency, lower acquisition costs, and better horizontal scaling.
Slow requests can stem from CDN back‑origin, rendering, network jitter, fine‑grained microservice calls, internal LAN traffic, hardware limitations, or middleware dependencies. From a microservice LAN perspective, the article lists several symptoms and hazards of slow interfaces, such as user churn, PHP‑FPM exhaustion, thread/process blockage, resource exhaustion, and system avalanche.
A "slow interface" is defined as an RPC call whose response time exceeds the 95th percentile of the application's request latency, incrementing a counter that, after an analysis window, flags the top‑K interfaces as slow.
Percentile‑based thresholds are preferred over simple averages because latency distributions are often skewed; SLOs are expressed as "90% of requests ≤ 80 ms" or similar confidence intervals.
Per‑application latency thresholds differ due to uneven machine distribution, varying traffic, and legacy code. The current rule ranks interfaces by p95 latency and call frequency, prioritizing the top five for optimization.
The monitoring pipeline stores logs locally, collects them via Flume, pushes to Kafka, and processes them in real‑time with a custom Snow system. Metrics are stored in Prometheus, triggering risk events that generate JIRA tasks for responsible teams, forming a closed‑loop workflow from detection to verification.
Task assignment targets the upstream service caller, ensuring owners are aware of downstream health. Tasks flow through creation, assignment, execution, verification, and closure, with priority based on benefit and effort.
Optimization guidelines include analyzing call graphs, consolidating fine‑grained calls, converting synchronous to asynchronous flows, reducing circular dependencies, limiting call depth, employing concurrency where possible, improving cache strategies, batching middleware operations, and adhering to proper dependency layering.
Common pitfalls to avoid are inserting artificial sleep delays, offloading business logic to the front‑end, and over‑provisioning cache without addressing hit‑rate issues.
Effectiveness is measured by monitoring weekly risk reports, tracking p95 latency trends, and ensuring optimized tasks remain closed.
The article concludes with a practical analysis workflow: select an application, examine tasks, trace call chains, filter by risk, analyze timelines, identify optimization points, and evaluate feasibility. Future posts will cover topology extraction, metric definition, capacity assessment, and more.
HaoDF Tech Team
HaoDF Online tech practice and sharing—join us to discuss and help create quality healthcare through technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.