Backend Development 8 min read

Why Did My Java Service’s Response Time Spike? Deep Dive into QPS, GC, and Load

A high‑traffic Java backend service suddenly suffered seconds‑long response times, prompting a systematic investigation that traced the issue from unexpected QPS spikes through database checks, local call delays, CPU load, and frequent ParNew GC, ultimately revealing oversized responses and memory pressure as the root cause.

Programmer DD

Sep 23, 2020

Why Did My Java Service’s Response Time Spike? Deep Dive into QPS, GC, and Load

On an afternoon, a partner reported that an internal HTTP service suddenly timed out (client timeout 300 ms). Monitoring showed QPS around 120, average response time (RT) 2–3 seconds, with spikes up to 5–6 seconds (normal ~60 ms).

QPS Situation

RT Situation

Problem Solving

The service is an internal operations platform deployed on only two Docker containers, expected to handle single‑digit QPS. It aggregates about 40 database queries per request and returns a ~50 KB JSON tree. No recent releases had been made and no caching was in place, so QPS rose to 120. After requesting the client to add caching, QPS dropped to under 80 and RT returned to ~60 ms, eventually stabilizing around 40 QPS.

Problem Identification

Because the core operation involves many DB queries, the first step was to check for slow SQL. Database monitoring showed average DB RT < 0.3 ms, giving a total DB time of roughly 12 ms, so slow queries were not the cause.

Next, the DB connection pool was examined. The default pool size (10) never exceeded 7 concurrent connections, ruling out pool exhaustion.

Thus the database layer was excluded.

Further analysis of the service call chain revealed many local calls taking several hundred milliseconds while the actual DB calls were < 1 ms.

本地调用耗时: 267ms
客户端发送请求: 0ms
服务端处理请求: 0ms
客户端收到响应: 1ms
总耗时: 1ms

The lengthy local execution time could not be explained by the code, so CPU load was inspected.

Load hovered around 4 on a 4‑core host, which was abnormal. GC logs showed many "Allocation Failure" events and ParNew GC occurring over 100 times per minute.

2020-03-25T16:16:18.390+0800: 1294233.934: [GC (Allocation Failure) 2020-03-25T16:16:18.391+0800: 1294233.935: [ParNew: 1770060K->25950K(1922432K), 0.0317141 secs] 2105763K->361653K(4019584K), 0.0323010 secs] [Times: user=0.12 sys=0.00, real=0.04 secs]

Each GC pause lasted about 0.04 s, but the high frequency consumed noticeable CPU time.

JVM heap configuration shows a 2 GB young generation, with Eden roughly 1.7 GB.

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 4294967296 (4096.0MB)
   NewSize                  = 2147483648 (2048.0MB)
   MaxNewSize               = 2147483648 (2048.0MB)
   OldSize                  = 2147483648 (2048.0MB)
   NewRatio                 = 2
   SurvivorRatio            = 10
   MetaspaceSize            = 268435456 (256.0MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 536870912 (512.0MB)
   G1HeapRegionSize         = 0 (0.0MB)

jmap inspection revealed many temporary objects from org.apache.tomcat.util.buf (e.g., ByteChunk, CharChunk, MessageBytes) created while constructing the large response.

Conclusion: The oversized 50 KB response must be copied from user space to kernel space, generating many temporary objects. Under high concurrency this fills the young generation, triggering frequent ParNew GC. Although ParNew GC is often thought not to stop the world, it does, blocking user threads, consuming CPU, and inflating response latency.

Experience Summary

API design should avoid excessively large response bodies; split large endpoints into smaller ones.

Implement server‑side caching for expensive query paths rather than relying solely on client‑side caching.

Know your system’s performance limits through load testing; otherwise a single hanging endpoint can jeopardize overall availability.

Isolate internal traffic from external traffic to prevent cache‑stampede and related issues.

Document performance expectations and verify them regularly.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Performance Docker GC

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.