Backend Development 10 min read

Diagnosing High CPU Usage in Java Applications with Arthas

Using the open‑source Arthas tool, the author traced a Java server’s 99 % CPU usage to two runaway threads, inspected their stack traces, discovered a cyclic bucket in a HashBiMap caused by unsynchronized cache updates, and resolved the issue by adding a synchronized keyword to the cache‑sync method.

Youzan Coder

May 28, 2020

Diagnosing High CPU Usage in Java Applications with Arthas

Recently an online application server showed CPU usage constantly at 99% despite low traffic. The author logged into the server to investigate the cause.

Traditional commands such as top -Hp and jstack can roughly locate the problem area but often provide insufficient information.

The author recommends using Arthas, an open‑source Java diagnostic tool from Alibaba, which can quickly pinpoint online issues. Installation instructions are available at https://alibaba.github.io/arthas.

To identify the threads consuming the most CPU, the thread command is used, which lists all threads and sorts them by CPU usage:

[arthas@384]$ thread
Threads Total: 112, NEW: 0, RUNNABLE: 26, BLOCKED: 0, WAITING: 31, TIMED_WAITING: 55, TERMINATED: 0
ID  NAME    STATE    %CPU  TIME
108 h..ec-0 RUNNABLE  51   4011:48
100 h..ec-2 RUNNABLE  48   4011:51
...

The output shows that two threads (IDs 108 and 100) consume almost all CPU time and have been running for more than 4000 minutes.

To see what these threads are doing, the thread id command displays their stack traces:

[arthas@384]$ thread 108
"http-nio-7001-exec-10" Id=108 cpuUsage=51% RUNNABLE
    at c.g.c.c.HashBiMap.seekByKey(HashBiMap.java)
    at c.g.c.c.HashBiMap.put(HashBiMap.java:270)
    at c.g.c.c.HashBiMap.forcePut(HashBiMap.java:263)
    at c.y.r.j.o.OaInfoManager.syncUserCache(OaInfoManager.java:159)

From the stack we suspect two possible reasons for the high CPU:

The seekByKey method is being called in a loop.

The seekByKey method contains an internal infinite loop.

To verify the first hypothesis, the tt command is used to monitor the method invocation. First, the author monitors the Spring MVC handler method:

[arthas@384]$ tt -t org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter invokeHandlerMethod -n 10
Press Q or Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 622 ms.

 INDEX  COST(ms)   OBJECT     CLASS                         METHOD
------------------------------------------------------------------------------------
 1000    481.203383 0x481eb705 RequestMappingHandlerAdapter    invokeHandlerMethod
 1001    3.432024   0x481eb705 RequestMappingHandlerAdapter    invokeHandlerMethod
...

The tt command records all parameters, return values, exceptions, and the target object for each call. Using the -i option together with -w and an OGNL expression, the author retrieves the Spring context and the bean containing the problematic cache:

[arthas@384]$ tt -i 1000 -w 'target.getApplicationContext()'
@AnnotationConfigServletWebServerApplicationContext[ ... ]

Next, the author decompiles the HashBiMap.seekByKey method to understand its logic:

private BiEntry<K, V> seekByKey(@Nullable Object key, int keyHash) {
    for (BiEntry<K, V> entry = hashTableKToV[keyHash & mask];
         entry != null;
         entry = entry.nextInKToVBucket) {
        if (keyHash == entry.keyHash && Objects.equal(key, entry.key)) {
            return entry;
        }
    }
    return null;
}

The method iterates over a bucket (a linked list) and returns the entry matching the key. If the linked list contains a cycle, the loop will never terminate.

To detect a possible cycle, the author crafts an OGNL script that traverses the nextInKToVBucket chain and stops after 50 iterations (a bucket is unlikely to have more than 50 nodes). The script is:

#loopCnt=0,
#foundCycle=:[ #this == null ? false :
    #loopCnt > 50 ? true : (
        #loopCnt = #loopCnt + 1,
        #foundCycle(#this.nextInKToVBucket)
    )]

The full command that combines the OGNL script with tt is:

tt -i 1000 -w 'target.getApplicationContext().getBean("oaInfoManager").userCache.entrySet().{delegate}.{^ #loopCnt = 0,#foundCycle = :[ #this == null ? false : #loopCnt > 50 ? true : (#loopCnt = #loopCnt + 1, #foundCycle(#this.nextInKToVBucket))], #foundCycle(#this)}.get(0)' -x 2

The command performs three steps:

Retrieves the HashBiMap object from the Spring bean oaInfoManager.

Iterates over its entries and finds the first entry whose bucket contains a cycle.

Uses -x 2 to expand the object graph enough to expose the cycle.

The execution result shows a cycle:

@BiEntry[
    key=@String[张三],
    value=@Long[1111],
    nextInKToVBucket=@BiEntry[
        key=@String[李四],
        value=@Long[2222],
        nextInKToVBucket=@BiEntry[张三=1111]
    ]
]

This confirms that the bucket forms a loop (张三 → 李四 → 张三). The root cause is a concurrent modification issue in HashBiMap.forcePut, leading to data inconsistency.

Fixing the problem is straightforward: add the synchronized keyword to the syncUserCache method to serialize access.

In conclusion, while a simple jstack could have solved the issue, this case demonstrates the powerful capabilities of Arthas for deep runtime inspection, helping developers and operators quickly resolve complex performance problems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java spring-mvc arthas CPU profiling Performance debugging Thread analysis

Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.