Operations 9 min read

How a GC, Thread Pool, and Slow SQL Combo Crippled a Java Service – Deep Postmortem & Fixes

A real‑world production incident where GC pauses, thread‑pool exhaustion, and slow SQL combined to drop QPS from 3000 to 1400 and inflate response times from 200 ms to over 2 s, with detailed analysis, diagnostic criteria, and step‑by‑step optimizations that restored performance.

Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
How a GC, Thread Pool, and Slow SQL Combo Crippled a Java Service – Deep Postmortem & Fixes

Incident Background

Service architecture: Spring Boot with embedded Tomcat

JVM heap: 8 GB

Server: 16 CPU / 32 GB RAM

Deployment: single instance

Symptom Summary

CPU usage: 35%–45%

Load average: 5–7

Memory: sufficient

QPS: 3000 → 1400

Response time (RT): 200 ms → 2 s+

Error rate: essentially zero

System‑level metrics look normal, but the business is clearly unavailable.

1️⃣ GC Dimension – Was STW stealing time?

Key metrics to watch (not just heap size)

Minor GC count – abnormal frequency?

GC pause duration – any pause > 200 ms?

Total STW time – does it coincide with QPS drop?

Old generation usage – steady increase?

Observed signals in this incident

Minor GC count spiked dramatically

GC pauses of 1–3 seconds

QPS sharply fell during GC peaks

<code>[GC pause (Allocation Failure) (young) 2.94s]</code>

Diagnosis criteria

STW pause > 1 s and frequent

QPS tightly correlated with GC pause

CPU not high but RT noticeably elongated

GC Optimizations (directly applicable)

Original JVM options (problematic version):

-Xms8g
-Xmx8g
-XX:+UseConcMarkSweepGC

Optimized JVM options:

-Xms12g
-Xmx12g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200

Code‑level tweaks:

Reduce temporary object creation

Avoid allocating large objects on hot paths

Result after GC tuning

STW pause reduced from 1–3 s to < 200 ms

QPS recovered from 1400 to > 2700

2️⃣ Thread‑Pool Dimension – Are slow requests exhausting threads?

Metrics to monitor

Active thread count – constantly near max?

Queue length – any queuing observed?

Request wait time – does it grow with concurrency?

Observed signals

Tomcat currentThreadsBusy long‑term near max

RT increased linearly with concurrency

New requests clearly queued

Thread stack (jstack) showed many threads blocked in DB calls:

java.sql.PreparedStatement.execute
Thread blocked by DB

Diagnosis criteria

CPU low while RT high

Thread pool saturated

RT grows with load

jstack shows many threads waiting on I/O/DB

Thread‑pool optimization steps

SQL optimization (key): SELECT * FROM `order` WHERE user_id = ? After adding an index:

CREATE INDEX idx_user_id_status ON `order`(user_id, status);

Thread‑pool isolation configuration:

# Core interface thread pool
core-pool-size: 150
# Non‑core interface thread pool
async-pool-size: 50

SQL timeout set to 1 s

Fast‑fail for slow requests

Result after thread‑pool tuning

Thread pool changed from constantly full to stable

RT improved from 1–2 s to ~200 ms

3️⃣ DB Dimension – Are hidden slow SQLs the bottleneck?

Metrics to watch

Number of slow SQLs – sudden increase?

Query latency – > 1 s?

Active connection count – steady rise?

Observed DB behavior

Slow‑query log snippets:

Query_time: 4.3s
Query_time: 5.1s

Few slow queries, but they appear on high‑concurrency paths

Frequently invoked, causing thread blockage

Diagnosis criteria

QPS drops while CPU stays normal

Threads waiting on DB

Prefer suspecting DB issues over application logic when these signs appear.

4️⃣ Integrated Troubleshooting Flow

QPS ↓
↓
Is GC causing STW pauses?
↓
Is the thread pool exhausted?
↓
Is the DB presenting slow SQL?

Never reverse the order.

5️⃣ Why CPU can be misleading

CPU only answers: “Is anyone using me for computation?”

It cannot tell you if the thread is paused by GC

It cannot tell you if the thread is waiting on DB

It cannot tell you if the thread is blocked on a lock

6️⃣ Takeaway

Java performance problems: 80% are not about raw compute power, but about pauses, waiting, and blocking.
JavaSQLThreadPoolTroubleshootinggc
Full-Stack DevOps & Kubernetes
Written by

Full-Stack DevOps & Kubernetes

Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.