Thread Profiling: Design and Implementation of Client‑Server Performance Analysis

Thread profiling uses threshold‑triggered tasks on business threads to capture stack snapshots, which a dedicated profiler thread sends via high‑performance gRPC to a server that queues them in Kafka, enriches and stores them in ClickHouse, correlates with OpenTelemetry traces, and provides metrics that let developers quickly pinpoint latency bottlenecks and improve system stability.

DeWu Technology
DeWu Technology
DeWu Technology
Thread Profiling: Design and Implementation of Client‑Server Performance Analysis

Thread profiling is a powerful technique for identifying high‑latency issues by collecting and analyzing runtime thread stacks.

The core idea is to create a threshold‑triggered detection task on business threads; when the threshold is exceeded, a dedicated profiling thread captures the stack and asynchronously sends it to a profiling server for analysis.

Implementation Overview

Client Design – tasks are created, scheduled on a time‑wheel (default 100 ms tick), executed, and exported. Tasks are queued, executed by a thread pool, and stack snapshots are pushed to a diagnostic queue. Data size can reach >200 KB per snapshot, so queue length is configurable.

Server Design – the server receives data via high‑performance gRPC, enqueues it into Kafka, parses and enriches it, then persists (e.g., ClickHouse). It also supports OpenTelemetry trace correlation.

Data Processing

Snapshots are pre‑aggregated, parent‑child stack frames are inferred, and self‑time is calculated using defined rules.

[
    {
        "data": "YXQgc3VuLm5pby5jaC5Vd...",
        "thread_name": "XNIO-1 I/O-1",
        "thread_state": "RUNNABLE",
        "trigger_millisecond": 500,
        "self_millisecond": 38,
        "source_snapshot_count": 153
    },
    {
        "data": "YXQgaW8udW5kZXJ0b3cuc2Vy...",
        "thread_name": "XNIO-1 task-1",
        "thread_state": "RUNNABLE",
        "trigger_millisecond": 500,
        "self_millisecond": 0,
        "source_snapshot_count": 140
    }
]

Monitoring Metrics – task queue size, task release latency, number of active profiling tasks, stack export latency, data queue size, ingestion rate, aggregation latency, export byte size and rate.

By integrating with OpenTelemetry, thread profiling can associate Trace ID, span ID, and interface names, providing comprehensive observability.

Overall, the approach helps developers quickly locate performance bottlenecks, improve application quality, and maintain system stability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaGogRPCKafkaOpenTelemetryPerformance Monitoringthread profiling
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.