How to Diagnose Slow OpenAPI Responses with TProfiler: A Step‑by‑Step Guide
This article walks through diagnosing intermittent OpenAPI latency by reproducing the issue, visualizing the request flow, using Alibaba's TProfiler agent without code changes, configuring and running profiling commands, analyzing the generated logs, and summarizing actionable performance improvements.
Background
Recent incidents showed an OpenAPI whose response time varied from tens of milliseconds to several seconds. The problem was not a business issue, making it hard to locate directly.
Our OpenAPI sometimes responds in dozens of milliseconds, other times it takes seconds.
Attempted Solution
Because the issue could not be reproduced in a fast test environment, we decided to use a non‑intrusive profiling agent. Initial attempts to blame Nginx or network were disproved; the logs confirmed the latency originated within the application.
We mapped the typical layered architecture:
Client → Nginx
Nginx forwards to backend web service
Web service calls backend Service via RPC
Logging Approach
Adding logs to every method would be invasive and require redeployment, so we looked for a solution that required no code changes.
Tool Analysis
We chose Alibaba's open‑source TProfiler agent, which can be attached via the -javaagent JVM argument. It records method execution times with minimal performance impact and no code intrusion.
Tool Usage
Clone the source (the original project is unmaintained, so we fixed some bugs) and build the jar:
git clone https://github.com/crossoverJie/TProfiler
mvn assembly:assemblyThe resulting tprofiler-1.0.1.jar is placed in TProfiler/pkg/TProfiler/lib/.
Run the application with the agent and configuration file:
-javaagent:/TProfiler/lib/tprofiler-1.0.jar
-Dprofile.properties=/TProfiler/profile.propertiesWe also created a simple HTTP endpoint using Cicada that calls two time‑consuming methods, then accessed it repeatedly to generate profiling data.
During execution, TProfiler writes detailed method information to tprofile.log. To flush the collected data into tmethod.log, run:
java -cp /TProfiler/tprofiler.jar com.taobao.profile.client.TProfilerClient 127.0.0.1:50000 flushmethodAfter flushing, tmethod.log contains method IDs, line numbers, and timestamps. By correlating method IDs with tprofile.log, we can compute average execution times:
java -cp /TProfiler/tprofiler.jar com.taobao.profile.analysis.ProfilerLogAnalysis tprofiler.log tmethod.log topmethod.log topobject.logThe resulting topmethod.log shows request count, average latency, and total time for each method.
Method‑Level Details
To view all samples for a specific method (e.g., selectDB), first find its ID in tmethod.log (e.g., ID 2), then grep the ID in tprofiler.log: grep 2 tprofiler.log This yields a list of individual execution times for that method.
Summary
Latency spikes were traced to database‑related methods; a hot‑cold data split and sharding are planned.
Before sharding, some write‑to‑DB operations were made asynchronous to reduce response time.
Consider adopting a distributed tracing system such as Pinpoint for future investigations.
Tools like TProfiler are valuable for quick performance diagnostics when APM solutions are not yet in place.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
