Why Did Our Backend Freeze? A Deep Dive into Connection‑Pool Exhaustion and Slow SQL

A detailed post‑mortem of a three‑time service outage reveals how hidden bugs, frequent FullGC, a saturated connection pool, and an unindexed slow SQL query crippled a Spring Boot backend, and shows the step‑by‑step troubleshooting, temporary fixes, and lasting improvements applied.

ITPUB
ITPUB
ITPUB
Why Did Our Backend Freeze? A Deep Dive into Connection‑Pool Exhaustion and Slow SQL

First Investigation

Problem Identification

1. Log into the website to confirm the outage. Front‑end resources responded quickly, but back‑end requests remained pending.

2. Open the container platform to check service status; average response time was about 21 seconds.

3. QPS, memory, and CPU looked normal, so the issue was not load‑related.

4. The monitoring platform showed an average response time of 16.2 seconds per minute.

5. JVM monitoring revealed a Full GC every five minutes, which paused the application.

6. Thread‑pool monitoring showed all threads busy and many queued.

7. Database‑connection‑pool monitoring indicated the pool was full.

Temporary Fix

Increasing the HikariCP maximum pool size to 20 in application.yml restored service quickly.

spring:
  hikari:
    maximum-pool-size: 20

Second Investigation

After the first fix, the service stalled again. The connection pool again reached its limit, and jstack showed many threads in TIMED_WAITING, mirroring the first incident.

The temporary solution was to redeploy, clearing the pool.

Third Investigation

Senior engineer suggested checking for slow SQL. A query executed over 7,000 times with an average of 1.4 seconds was identified.

The query lacked an index on the scene column, causing a full‑table scan during each WeChat QR‑code login poll.

Adding an index on scene immediately restored normal response times and reduced connection‑pool usage.

Explain plans confirmed the query now used the index.

Key Takeaways

When a service stalls, first add capacity (e.g., a new instance) before deep investigation.

Increasing the DB connection pool can be a quick fix, but the default size may be too conservative.

Temporary fixes do not solve root causes; continuous monitoring is essential.

Understanding how to locate thread‑pool saturation and slow SQL is critical for efficient troubleshooting.

Ultimately, the incident highlighted insufficient experience among developers and the importance of indexing frequently queried columns, proper connection‑pool sizing, and proactive performance testing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendTroubleshootingslow SQLspring-bootconnection-pool
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.