Backend Development 9 min read

Why Did My Java Service’s Response Time Spike? A Deep Dive into QPS, GC, and CPU Load

An internal Java‑based HTTP service suddenly suffered high latency and timeouts, prompting a systematic investigation that uncovered excessive QPS, frequent ParNew GCs, CPU load spikes, and large response payloads, leading to concrete performance and design improvements.

Programmer DD

Aug 9, 2020

Why Did My Java Service’s Response Time Spike? A Deep Dive into QPS, GC, and CPU Load

Background Introduction

One afternoon around 4 pm we received a notice from a partner that the online HTTP service we maintain suddenly timed out massively (partner timeout set to 300 ms). Using the Eagle Eye platform we opened sampling and found the service averaging about 120 QPS with response times of 2–3 seconds, occasional spikes up to 5–6 seconds (normal is around 60 ms).

Response‑time trend:

Problem Resolution

The service is an internal operations platform deployed on only two Docker containers, expected to handle single‑digit QPS and had no recent releases. Each request performs roughly 40 database queries and returns a multi‑level tree structure of about 50 KB. Because QPS rose to ~120, we asked the caller to add caching, which reduced QPS to under 80 and later to around 40, restoring the average response time to 60 ms.

Problem Diagnosis

Since the core operation is database access, we first checked for slow SQL. Monitoring showed the DB average RT was under 0.3 ms, giving an overall DB cost of about 12 ms, so slow queries were ruled out.

We also examined the connection pool (default size 10). The total number of connections never exceeded 7, so pool exhaustion was not the cause.

With the database layer cleared, we inspected the service call chain. Many local calls took several hundred milliseconds while the actual DB calls were sub‑millisecond.

本地调用耗时: 267ms
客户端发送请求: 0ms
服务端处理请求: 0ms
客户端收到响应: 1ms
总耗时: 1ms

CPU load on the host (4 C / 8 GB) lingered around 4, which is abnormal for a two‑container deployment.

GC logs revealed frequent Allocation Failure events and more than 100 ParNew collections per minute, indicating severe young‑generation pressure.

2020-03-25T16:16:18.390+0800: 1294233.934: [GC (Allocation Failure) 2020-03-25T16:16:18.391+0800: 1294233.935: [ParNew: 1770060K->25950K(1922432K), 0.0317141 secs] 2105763K->361653K(4019584K), 0.0323010 secs] [Times: user=0.12 sys=0.00, real=0.04 secs]

Each GC pause lasted about 0.04 s, but the high frequency resulted in noticeable CPU consumption.

The JVM heap configuration allocated 2 GB to the young generation (Eden ≈ 1.7 GB):

Heap Configuration:
   MinHeapFreeRatio          = 40
   MaxHeapFreeRatio          = 70
   MaxHeapSize               = 4294967296 (4096.0MB)
   NewSize                   = 2147483648 (2048.0MB)
   MaxNewSize                = 2147483648 (2048.0MB)
   OldSize                   = 2147483648 (2048.0MB)
   NewRatio                  = 2
   SurvivorRatio             = 10
   MetaspaceSize             = 268435456 (256.0MB)
   CompressedClassSpaceSize  = 1073741824 (1024.0MB)
   MaxMetaspaceSize          = 536870912 (512.0MB)
   G1HeapRegionSize          = 0 (0.0MB)

Using jmap we saw that the young generation was filled with many temporary objects from the Tomcat buffer package (ByteChunk, CharChunk, MessageBytes, etc.) generated while constructing the 50 KB response.

In summary, the large response payload required copying from user space to kernel space, and the massive temporary object allocation caused the young generation to fill, triggering frequent ParNew GCs that stop the world. This blocked user threads, increased latency, and ultimately degraded the service. Visual monitoring tools (Eagle Eye, IDB) were essential for pinpointing the issue.

Key Takeaways

Design APIs to avoid overly large response bodies; split large endpoints into smaller ones.

Implement server‑side caching for endpoints that perform many database queries instead of relying solely on client‑side caching.

Conduct performance testing to understand system limits and prevent single‑point bottlenecks.

Isolate internal traffic from external traffic to avoid cache‑stampede problems.

Document performance requirements and verify them regularly, not just verbally.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java monitoring

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.