Debugging a 100 ms Latency Bug in a Spring Boot Channel System Using Arthas
This article documents the step‑by‑step investigation of an unexpected ~100 ms response delay in a Spring Boot channel service, showing how network checks, curl tests, Arthas trace/stack commands, and source analysis revealed a Tomcat‑embed bug caused by Swagger‑ui resources and how upgrading Tomcat resolved the issue.
Background
The author maintains a simple Spring Boot channel system that only performs message conversion and parameter validation. After code optimisation the overall response time was still unsatisfactory, with an extra ~100 ms observed between the recorded processing time (e.g., 150 ms) and the client‑side latency (≈250 ms).
Investigation Process
Code Analysis
The project is a standard Spring Boot web application using the embedded Tomcat server. No custom filters or interceptors were found, so the issue was not caused by business logic.
Call‑flow Analysis
The request flow is Nginx → Channel System . Ping tests from the client host to the Nginx server and from Nginx to the channel server showed sub‑millisecond latency, ruling out network problems.
Nginx -反向代理-> 渠道系统Directly invoking the service via curl on the loopback address revealed a first‑call latency of about 73 ms and a second‑call latency of only 3 ms, suggesting a caching effect.
[jboss@VM_10_91_centos tmp]$ curl -w "@curl-time.txt" http://127.0.0.1:7744/send
... time_total: 0.073s
[jboss@VM_10_91_centos tmp]$ curl -w "@curl-time.txt" http://127.0.0.1:7744/send
... time_total: 0.003sUsing Arthas for Diagnosis
Arthas, Alibaba’s open‑source Java diagnostic tool, was employed. The trace command on Spring MVC entry points showed only ~18 ms spent inside Spring MVC, leaving ~97 ms unaccounted for.
time_total: 0.115s (client side)
Spring MVC trace: 18msStack inspection of org.springframework.web.servlet.DispatcherServlet exposed a long call chain ending in Tomcat’s Http11Processor.service . Tracing this class highlighted a 129 ms hotspot.
Pinpointing the Hotspot
Further tracing of org.apache.catalina.webresources.TomcatJarInputStream.getNextJarEntry() revealed 31 invocations consuming a total of 74 ms.
+---[min=0.004452ms,max=34.479307ms,total=74.206249ms,count=31] org.apache.catalina.webresources.TomcatJarInputStream:getNextJarEntry() #117The method loads META‑INF entries from JAR files. Using Arthas’s watch command on createZipEntry showed that the loaded resource was swagger‑ui (a Swagger UI JAR). Removing the Swagger dependencies eliminated the extra latency.
Root Cause Analysis
The problem is a bug in Tomcat‑embed 8.5.31 (bundled with Spring Boot 2.0.2.RELEASE). Each request triggers static‑resource checks via org.apache.catalina.mapper.Mapper#internalMapWrapper , causing repeated JAR‑resource loading. The cache ( org.apache.catalina.webresources.Cache ) has a default TTL of 5 seconds, which explains why consecutive requests are fast while spaced‑out requests suffer the delay.
Solution
Upgrade the embedded Tomcat version to 8.5.40 or later, or upgrade Spring Boot to a version that includes the fixed Tomcat release (e.g., Spring Boot 2.1.0+).
For Maven projects that inherit spring-boot-starter-parent , the Tomcat version can be overridden:
<properties>
<tomcat.version>8.5.40</tomcat.version>
</properties>Alternatively, update the Spring Boot parent version to 2.1.0.RELEASE or newer.
Conclusion
By combining network checks, curl benchmarking, and Arthas tracing/stack commands, the hidden 100 ms latency was traced to a Tomcat‑embed bug triggered by Swagger‑ui resources. Upgrading Tomcat (or Spring Boot) resolves the issue and restores expected response times.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.