Why Does My Service’s CPU Spike After Restart? Deep Dive into Thread Bottlenecks and JIT Compilation

This article analyzes the CPU spikes that occur within minutes after a Java service restart, explains how excessive Runnable threads, frequent context switches, and JIT compiler activity cause the overload, and presents step‑by‑step diagnostics and mitigation strategies such as traffic greying, cache pre‑warming, request‑parameter cleanup, and JVM warm‑up.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Why Does My Service’s CPU Spike After Restart? Deep Dive into Thread Bottlenecks and JIT Compilation

Background

After each deployment or restart our service repeatedly generated alerts for several minutes, with CPU usage soaring to near 100% and many HTTP/Dubbo requests timing out.

Problem Symptoms

Dubbo interface timeouts (hundreds of failed requests)

HTTP interface latency spikes (P95 from tens of ms to seconds)

CPU usage near 100% during the first few minutes of traffic

Runnable and Blocked thread counts surge dramatically

Old‑generation heap memory grows quickly after traffic is restored

Initial Root Cause

Monitoring showed that after traffic is restored the service does not have enough threads to handle the load, causing massive thread creation and frequent context switches, which drive CPU to saturation.

The main culprit is the large number of Runnable threads and the heavy use of

org.springframework.beans.PropertyMatches.calculateStringDistance

, which is invoked when Spring throws NotWritablePropertyException for undefined request parameters.

Preliminary Solutions

4.1 Gradual Traffic Release (Grey‑scale)

We delayed full traffic and increased it step‑by‑step (1%, 5%, 44%, then full). CPU spikes disappeared, thread counts stabilized, and timeouts vanished.

4.2 Cache Pre‑warming

We tried loading hot data into Redis before traffic, but the impact was negligible because many caches (including JVM internal caches) also need warming.

Detailed Analysis

5.1 Thread Stack & Flame Graph

We captured thread stacks and CPU flame graphs after a restart. Key observations:

Runnable threads rose to 462 (normally ~70), most being catalina‑exec Tomcat threads.

~200 threads were blocked in calculateStringDistance.

Flame graph showed calculateStringDistance consuming ~64% of CPU during the spike.

Source code of calculateStringDistance uses a double loop implementing the Levenshtein algorithm, which is CPU‑intensive under high concurrency.

private static int calculateStringDistance(String s1, String s2) {
  if (s1.isEmpty()) return s2.length();
  if (s2.isEmpty()) return s1.length();
  int[][] d = new int[s1.length()+1][s2.length()+1];
  for (int i = 0; i <= s1.length(); i++) d[i][0] = i;
  for (int j = 0; j <= s2.length(); j++) d[0][j] = j;
  for (int i = 1; i <= s1.length(); i++) {
    char c1 = s1.charAt(i-1);
    for (int j = 1; j <= s2.length(); j++) {
      int cost = (c1 == s2.charAt(j-1)) ? 0 : 1;
      d[i][j] = Math.min(Math.min(d[i-1][j]+1, d[i][j-1]+1), d[i-1][j-1]+cost);
    }
  }
  return d[s1.length()][s2.length()];
}

The method is triggered during Spring’s data‑binding when a request contains parameters that have no corresponding setter, leading to the exception and the costly similarity calculation.

5.2 JIT Compilation Overhead

Even after fixing the parameter issue, small CPU bumps remained. Flame graphs revealed that the HotSpot C2 JIT compiler consumed a noticeable share of CPU during the “compilation phase” when many methods became hot.

Dashboard snapshots showed three CompilerThread instances (C1 and C2) dominating CPU at that moment.

Solutions Implemented

Gradual traffic release : proved effective in eliminating spikes.

Request‑parameter cleanup : ensure all public parameters have setters or filter them out in a servlet filter.

JVM warm‑up (pre‑heat) : after the service reports ready (check.do), invoke high‑traffic HTTP endpoints a configurable number of times before exposing the service to real traffic, allowing the C2 compiler to finish its work early.

Results

CPU peak reduced from ~97% to ~61% with only minor short spikes.

Runnable thread surge shortened from 6 minutes to ~40 seconds, peak count dropped from ~600 to ~280.

HTTP/Dubbo P95 latency fell from >50 s to <4 s; P99 also improved dramatically.

Conclusion

The root cause of the post‑restart CPU explosion was a combination of excessive thread creation and costly Spring data‑binding for undefined parameters, amplified by JIT compilation activity. By gradually releasing traffic, cleaning up request parameters, and performing JVM warm‑up, the service achieved stable performance and eliminated frequent alerts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Java performanceSpring MVCArthascpu-profilingThread analysisJIT Compilation
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.