Analyzing TraceId Loss in Spring @Async and Distributed Tracing Solutions
The article investigates a missing TraceId in a Spring @Async call, analyzes the underlying design of MTrace and Google Dapper, examines ThreadLocal propagation mechanisms, identifies SimpleAsyncTaskExecutor as the root cause, and presents a custom thread‑pool solution while comparing alternative distributed tracing systems.
Problem Background and Reproduction
During an online alarm investigation a log line "2022-08-02 19:26:34.952 DXMsgRemoteService" was found without a TraceId, causing the call chain to stop. The issue was reproduced with a minimal @Async example.
@SpringBootTest
@RunWith(SpringRunner.class)
@EnableAsync
public class DemoServiceTest extends TestCase {
@Resource
private DemoService demoService;
@Test
public void testTestAsy() {
Tracer.serverRecv("test");
System.out.println("------We got main thread: " + Thread.currentThread().getName() + " - " + Thread.currentThread().getId() + " Trace Id: " + Tracer.id() + "----------");
demoService.testAsy();
}
}
@Component
public class DemoService {
@Async
public void testAsy() {
System.out.println("======Async====");
System.out.println("------We got asy thread: " + Thread.currentThread().getName() + " - " + Thread.currentThread().getId() + " Trace Id: " + Tracer.id() + "----------");
}
}Running the code prints a valid TraceId in the main thread but null in the async thread, confirming loss during thread transfer.
Deep Analysis
MTrace vs Google Dapper
MTrace is Meituan's internal tracing system built on Google Dapper's design. Dapper records a span (name, id, parent id, trace id) and a set of annotations (Client Send, Server Recv, etc.) for each RPC. The trace id is a 64‑bit global identifier stored in the span.
MTrace adds UUID‑xor generated TraceId and SpanId, supports batch compression, and covers RPC, HTTP, MySQL, cache and MQ.
@Async asynchronous tracing
Spring creates an AsyncAnnotationBeanPostProcessor that builds an AsyncExecutionInterceptor. The interceptor obtains an Executor (default SimpleAsyncTaskExecutor) and wraps the target method in a Callable submitted to that executor.
public Object invoke(final MethodInvocation invocation) throws Throwable {
AsyncTaskExecutor executor = determineAsyncExecutor(userDeclaredMethod);
if (executor == null) {
throw new IllegalStateException("No executor specified ...");
}
Callable<Object> task = () -> {
try {
Object result = invocation.proceed();
if (result instanceof Future) {
return ((Future<?>) result).get();
}
} catch (ExecutionException ex) {
handleError(ex.getCause(), userDeclaredMethod, invocation.getArguments());
} catch (Throwable ex) {
handleError(ex, userDeclaredMethod, invocation.getArguments());
}
return null;
};
return doSubmit(task, executor, invocation.getMethod().getReturnType());
}If no executor bean is specified, Spring falls back to SimpleAsyncTaskExecutor, which creates a brand‑new thread for each task.
Reason for TraceId loss
Trace information is stored in ThreadLocal. When a new thread is created, the original ThreadLocal is not automatically propagated. The article compares three propagation mechanisms:
InheritableThreadLocal : copies values only when a thread is created; ineffective for thread‑pool reuse.
TransmittableThreadLocal (Alibaba): captures values into a holder, injects them into the child thread, then restores them. It incurs serialization overhead and is not a standard API.
TransmissibleThreadLocal : similar to the above but implements its own capture/replay/restore logic.
MTrace implements a custom TransmissibleThreadLocal and instruments ThreadPoolExecutor, ScheduledThreadPoolExecutor and ForkJoinTask via a javaagent. However, SimpleAsyncTaskExecutor implements only java.util.concurrent.Executor and is not instrumented, so the TraceId remains null in the async thread.
Solution
Define a real thread‑pool (e.g., ThreadPoolTaskExecutor) and reference it in the @Async annotation. The custom pool is covered by MTrace’s javaagent, so the TraceId is correctly propagated.
@Configuration
public class ThreadPoolConfig {
@Bean("taskExecutor")
public Executor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(10);
taskExecutor.setMaxPoolSize(50);
taskExecutor.setQueueCapacity(200);
taskExecutor.setKeepAliveSeconds(60);
taskExecutor.setThreadNamePrefix("myExecutor--");
taskExecutor.setWaitForTasksToCompleteOnShutdown(true);
taskExecutor.setAwaitTerminationSeconds(60);
taskExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
taskExecutor.initialize();
return taskExecutor;
}
}
@Async("taskExecutor")
public void testAsy() { ... }Running the same test now prints a matching TraceId in both main and async threads, proving the loss is fixed.
Other Distributed Tracing Systems – Comparison
Zipkin, SkyWalking and EagleEye all follow Dapper’s span model and use ThreadLocal for context. Zipkin offers an InheritableThreadLocal fallback; SkyWalking uses a ContextSnapshot captured and replayed around tasks; EagleEye relies on javaagent bytecode enhancement similar to MTrace but is internal to Alibaba.
Conclusion
The article demonstrates a systematic approach: identify the alarm, reproduce the missing TraceId, analyze @Async’s default executor, compare ThreadLocal propagation strategies, discover that SimpleAsyncTaskExecutor bypasses MTrace’s instrumentation, and finally apply a custom ThreadPoolExecutor to recover the TraceId, while also reviewing alternative tracing solutions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
