Root Cause Analysis of High Latency in a Java HTTP Service: QPS Surge, GC Overhead, and Memory Pressure

The article details a real‑world investigation of a Java HTTP service that experienced a sudden QPS increase and response‑time spikes, tracing the issue through database queries, local method latency, CPU load, frequent ParNew GCs, and large response payloads, and presents concrete remediation steps.

Architecture Digest
Architecture Digest
Architecture Digest
Root Cause Analysis of High Latency in a Java HTTP Service: QPS Surge, GC Overhead, and Memory Pressure

Incident Overview

In the early afternoon a partner reported that an internal HTTP service began timing out (partner timeout 300 ms). Monitoring showed the service’s QPS rose to about 120, with average response times (RT) of 2–3 seconds and occasional spikes up to 5–6 seconds (normal RT ≈ 60 ms).

QPS Situation

RT Situation

Problem Resolution

The service is an internal operations platform deployed on two Docker containers, expected to handle only single‑digit QPS. Each request performs roughly 40 database queries and returns a ~50 KB hierarchical JSON payload. No caching was in place, so the partner was asked to add caching, which reduced QPS to under 80 and restored RT to ~60 ms; later QPS fell to around 40 after further cache enforcement.

Root Cause Investigation

Because the service heavily accesses the database, the first step was to check for slow SQL. DB monitoring showed average DB RT < 0.3 ms, giving an overall DB cost of ~12 ms per request, ruling out slow queries.

Next, the DB connection pool (default size 10) was examined; the maximum concurrent connections never exceeded 7, so pool exhaustion was not the cause.

Attention then turned to the service’s own execution points via tracing. Many local method calls took several hundred milliseconds, while actual DB calls were sub‑millisecond.

本地调用耗时: 267ms
客户端发送请求: 0ms
服务端处理请求: 0ms
客户端收到响应: 1ms
总耗时: 1ms

CPU load hovered around 4 on a 4‑core host, which is unusually high for a service that should be idle most of the time. GC logs revealed frequent ParNew collections triggered by allocation failures, with over 100 collections per minute.

2020-03-25T16:16:18.390+0800: 1294233.934: [GC (Allocation Failure) 2020-03-25T16:16:18.391+0800: 1294233.935: [ParNew: 1770060K->25950K(1922432K), 0.0317141 secs] 2105763K->361653K(4019584K), 0.0323010 secs] [Times: user=0.12 sys=0.00, real=0.04 secs]

JVM heap configuration allocated 2 GB to the young generation (≈ 1.7 GB Eden). Heap dumps showed many temporary objects from org.apache.tomcat.util.buf (e.g., ByteChunk, CharChunk, MessageBytes) created while assembling the large response.

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 4294967296 (4096.0MB)
   NewSize                  = 2147483648 (2048.0MB)
   MaxNewSize               = 2147483648 (2048.0MB)
   OldSize                  = 2147483648 (2048.0MB)
   NewRatio                 = 2
   SurvivorRatio            = 10
   MetaspaceSize            = 268435456 (256.0MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 536870912 (512.0MB)
   G1HeapRegionSize         = 0 (0.0MB)

The large 50 KB response had to be copied from user space to kernel space before being sent over the network, and the massive number of temporary objects filled the young generation, causing frequent GC pauses. Although ParNew is often thought to be non‑stop‑the‑world, it does pause user threads, leading to thread blocking, CPU time‑slice loss, and ultimately higher latency.

Key Takeaways

Design APIs to avoid overly large response bodies; split large endpoints into smaller ones.

Implement server‑side caching for endpoints that perform many DB queries; relying on the caller alone is risky.

Continuously profile and load‑test your services to understand performance ceilings before incidents occur.

Isolate internal and external traffic to prevent cache‑stampede and other contention issues.

Document performance expectations and verify them regularly, rather than relying on verbal agreements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendJVMmonitoringperformancegc
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.