Operations 30 min read

Vtrace Trace Enhancement: Implementing Continuous Profiling for Method-Level Monitoring

Vtrace’s new continuous profiling feature fills previous Trace blind spots by sampling method‑level execution via JMX, delivering detailed intra‑service insights comparable to leading APM tools while keeping CPU overhead under five percent even at high transaction rates.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Vtrace Trace Enhancement: Implementing Continuous Profiling for Method-Level Monitoring

This article describes how vivo's application monitoring product Vtrace evolved to address monitoring blind spots in Trace data. The team identified that their existing Trace only captured inter-service and service-component spans, missing method-level execution details within services.

The article presents a comprehensive analysis including:

1. Background: Vtrace's Trace data only connected services and components but couldn't monitor internal method execution times, creating monitoring blind spots. A test endpoint with four private methods (doSleep, synchronizedBlockBySelectMysql, readFileAndToJson, sendKafka) demonstrated this limitation.

2. Competitive Analysis: Comparison of three APM products - SkyWalking, DataDog, and Dynatrace - revealed that Vtrace lagged behind in Trace observation capabilities. Dynatrace showed the most comprehensive method-level visibility, identifying all four methods and their underlying operations (Waiting, Locking, Disk I/O, Network I/O).

3. Technical Design: The team explored three CPU profiling approaches: JMX, JFR, and JVMTI AsyncGetCallTrace. They chose JMX for rapid implementation despite performance concerns, with plans to migrate to AsyncGetCallTrace later. The JMX-based design uses a 50ms sampling frequency to collect thread stack traces during Trace execution.

4. Pressure Testing Results: Testing showed CPU overhead of less than 5% at 1000 TPS with 4 cores, and less than 4% at 100 TPS with 2 cores. The main bottleneck was ThreadImpl.getThreadInfo1 native method calls.

5. Implementation Assessment: With 91.2% of services having TPS below 100 and 75% below 20, the team concluded that JMX-based profiling is viable for most services with acceptable resource overhead.

performance optimizationAPMapplication monitoringcontinuous profilingJavaAgentJMXTrace MonitoringVtrace
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.