Why Traditional API Gateways Crash: CPU, Disk, and Network Bottlenecks Explained
The article examines how traditional synchronous and semi‑synchronous API gateways can fail under high load by analyzing CPU utilization, disk I/O, and network latency, and offers practical monitoring metrics and mitigation strategies to prevent cascading failures in large‑scale systems.
In traditional API gateways—those using synchronous or semi‑synchronous communication—the request handling thread and business processing thread are separated, but business logic still runs synchronously. Fully asynchronous gateways process the entire request chain asynchronously.
API gateways face two major characteristics: massive traffic and numerous downstream dependencies. When a gateway calls many backend services via RPC, the stability of each service directly impacts the gateway’s overall reliability.
Because of these traits, monitoring internal factors such as CPU, disk, and network becomes crucial.
CPU utilization shows the real‑time percentage of CPU cycles a program consumes, while CPU load reflects the average number of runnable tasks over a period. On Linux, commands like uptime or top reveal load averages (e.g., "11:36 up 23 days, 2:31, 2 users, load averages: 1.74 1.58 1.60"). High utilization does not always mean high load; they are independent metrics.
Disk health is measured by usage percentage and I/O load percentage. The iostat -x 1 10 command (installable via yum install sysstat) reports the %util metric; values approaching 100% indicate I/O saturation and potential bottlenecks.
Network latency dominates RPC call time in microservice architectures. Degraded network performance between the gateway and a downstream service can increase response times, causing thread pools to grow and eventually exhausting CPU resources, leading to an avalanche effect.
When any of these resources—CPU, disk, or network—become constrained, the gateway may experience three typical failure modes: being "dragged down" by a slow downstream service, being "killed" by excessive error‑log writes that fill the disk, or being "blocked" by network faults that slow down RPC calls.
Understanding these failure patterns allows teams to proactively implement safeguards, such as proper logging levels, resource limits, and asynchronous gateway designs, to maintain high availability even under heavy load.
CPU utilization: real‑time percentage of CPU used by the process.
CPU load: average number of runnable tasks over 1, 5, and 15 minutes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
