How to Trace WebSocket Connections End‑to‑End with OpenTelemetry and LoongSuite

This article explains the fundamentals of the WebSocket protocol, its evolution in AI scenarios, and provides detailed, step‑by‑step guidance on implementing full‑link observability using OpenTelemetry APIs and LoongSuite probes, including code samples for Java, Go, and Python.

Alibaba Cloud Observability
Alibaba Cloud Observability
Alibaba Cloud Observability
How to Trace WebSocket Connections End‑to‑End with OpenTelemetry and LoongSuite

WebSocket protocol basics

WebSocket is a TCP‑based full‑duplex protocol defined in RFC 6455. A client initiates an HTTP GET request with Upgrade: websocket and Connection: Upgrade headers. The server replies with status 101 Switching Protocols and a Sec-WebSocket-Accept header derived from the client’s Sec-WebSocket-Key. After the handshake the TCP connection is upgraded to a persistent WebSocket channel that can carry text, binary, or control frames. The URI scheme is ws:// (plain) or wss:// (TLS), default ports 80 and 443.

Frames consist of a FIN bit, opcode, mask flag, payload length and optional masking key. Three opcode categories exist:

Text frame : UTF‑8 payload.

Binary frame : binary payload.

Control frame : ping, pong, close, etc.

Closing the connection is performed by exchanging a close control frame; after the peer acknowledges, the underlying TCP socket is terminated.

Observability challenges for WebSocket

Traditional HTTP tracing (e.g., W3C Trace‑Context) injects context via HTTP headers. WebSocket frames carry only binary payloads, so there is no place to embed tracing headers after the handshake. This creates three major problems:

Context injection difficulty : No built‑in mechanism to propagate trace IDs across data frames.

Span boundary ambiguity : A single connection may represent many logical operations (one long‑lived session, multiple request‑response pairs, or per‑frame processing).

Reverse propagation : In server‑initiated messages the server must act as the caller and inject its context into the outbound frame.

Additionally, asynchronous processing (thread pools, internal queues, external stores such as Redis) can break the propagation of the in‑process context, leading to “broken traces”.

OpenTelemetry API and LoongSuite probe

OpenTelemetry provides a language‑agnostic API for creating spans, managing context, and exporting telemetry. The LoongSuite probe is a non‑intrusive agent built on the OpenTelemetry API that automatically instruments popular libraries (e.g., Tomcat, LangChain, OpenAI SDK). It shares the same API implementation as the application, allowing custom spans created via the OpenTelemetry SDK to interoperate with the probe‑generated spans.

Key steps to integrate OpenTelemetry:

<dependency>
  <groupId>io.opentelemetry</groupId>
  <artifactId>opentelemetry-api</artifactId>
</dependency>
import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.trace.Tracer;

Tracer tracer = GlobalOpenTelemetry.getTracer("websocket-demo", "1.0.0");

Similar commands exist for Go ( go get go.opentelemetry.io/otel) and Python ( pip install opentelemetry-api).

Tracing models for WebSocket

Connection‑level trace : Treat the whole WebSocket lifecycle as a single trace. All messages are child spans of the connection span. Suitable for short‑lived connections (seconds‑to‑minutes).

Session‑ID linked traces : Each logical request/response pair creates its own trace, and the WebSocket sessionId is stored as a span attribute ( websocket.session.id). Traces can be correlated later via this attribute. This model works for long‑lived connections that are reused for many independent interactions.

Injecting trace context during the handshake

Before the client sends the handshake request, inject the current OpenTelemetry context into the HTTP headers using the global propagator:

Map<String, List<String>> headers = new HashMap<>();
Context current = Context.current();
GlobalOpenTelemetry.getPropagators()
    .getTextMapPropagator()
    .inject(current, headers, (carrier, key, value) -> carrier.put(key, List.of(value)));
// Configure the WebSocket client to add these headers to the handshake request

On the server side, a custom ServerEndpointConfig.Configurator extracts the headers back into a Context and stores it in the endpoint’s user properties. When @OnOpen is invoked, the stored context becomes the parent for any subsequent spans.

public static class TraceContextConfigurator extends ServerEndpointConfig.Configurator {
    private static final TextMapGetter<Map<String, List<String>>> getter =
        new TextMapGetter<>() {
            public Iterable<String> keys(Map<String, List<String>> carrier) { return carrier.keySet(); }
            public String get(Map<String, List<String>> carrier, String key) {
                List<String> v = carrier.get(key);
                return (v != null && !v.isEmpty()) ? v.get(0) : null;
            }
        };
    @Override
    public void modifyHandshake(ServerEndpointConfig sec, HandshakeRequest request, HandshakeResponse response) {
        Context ctx = GlobalOpenTelemetry.getPropagators()
            .getTextMapPropagator()
            .extract(Context.current(), request.getHeaders(), getter);
        sec.getUserProperties().put("traceContext", ctx);
    }
}

Span creation patterns

Typical client‑side code (Java native WebSocket) creates a connection‑level span, makes it current, and then creates child spans for each sent message:

Span connectionSpan = tracer.spanBuilder("websocket.connection")
    .setAttribute("websocket.endpoint", "/native/ws")
    .setAttribute("websocket.destination", "ws://localhost:18081")
    .startSpan();
try (Scope scope = connectionSpan.makeCurrent()) {
    // WebSocket client setup …
    // For each user input:
    Span sendSpan = tracer.spanBuilder("Client send message").startSpan();
    try (Scope s = sendSpan.makeCurrent()) {
        // inject context into custom header map and send
    } finally {
        sendSpan.end();
    }
} finally {
    connectionSpan.end();
}

Server‑side @OnMessage extracts the incoming context, creates a server span with setParent(extractedContext), and processes the payload:

Object ctxObj = config.getUserProperties().get("traceContext");
Context parent = (ctxObj instanceof Context) ? (Context) ctxObj : Context.current();
Span serverSpan = tracer.spanBuilder("Server handle message")
    .setParent(parent)
    .startSpan();
try (Scope s = serverSpan.makeCurrent()) {
    // business logic
} catch (Exception e) {
    serverSpan.recordException(e);
} finally {
    serverSpan.end();
}

Handling asynchronous processing

LoongSuite automatically propagates the active context when a Runnable or Callable is submitted to a thread pool. For custom queues, the application must manually capture the context at enqueue time and restore it when dequeuing:

// Enqueue
Span msgSpan = tracer.spanBuilder("Process message").startSpan();
Context ctx = Context.current();
message.setTracingContext(ctx);
queue.offer(message);
// Dequeue
Message msg = queue.take();
Context ctx = msg.getTracingContext();
try (Scope s = ctx.makeCurrent()) {
    // process
} finally {
    Span.fromContext(ctx).end();
}

Metrics for streaming WebSocket traffic

A utility class can record key streaming metrics: time‑to‑first‑chunk, time‑to‑last‑chunk, average inter‑chunk interval, and total chunk count. The class stores timestamps for the first and last chunks and updates an atomic counter and interval accumulator on each recordChunk() call.

public class WebSocketPerformanceMeasure {
    private Long startTime;
    private Long firstChunkTime = -1L;
    private AtomicInteger chunkCounts = new AtomicInteger(0);
    private AtomicLong totalInterval = new AtomicLong(0);
    private Long lastChunkTime = -1L;
    public static WebSocketPerformanceMeasure create() { … }
    public Long recordChunk() { … }
    public Long getTimeToFirstChunk() { … }
    public Long getAverageInterval() { … }
    public int getChunkCount() { return chunkCounts.get(); }
}

These metrics can be attached as span attributes or exported to a monitoring system, enabling dashboards that show latency spikes in streaming responses.

AI voice‑assistant use case

The article demonstrates a full‑stack AI voice‑assistant built on WebSocket:

Device sends audio chunks to the server via a WebSocket connection.

Server performs ASR (automatic speech recognition), forwards the transcript to an LLM for intent detection, and streams generated text back.

Generated text is fed to a TTS service, and the audio response is streamed back to the device.

LoongSuite is attached to the Java service via the AliyunJavaAgent (version ≥ 4.6.0). The agent is started with:

export JAVA_AGENT_OPTIONS="-javaagent:/path/to/AliyunJavaAgent/aliyun-java-agent.jar \
    -Darms.licenseKey=${YOUR_LICENSE_KEY} \
    -Darms.appName=websocket-demo \
    -Daliyun.javaagent.regionId=cn-hangzhou \
    -Darms.workspace=${YOUR_WORKSPACE}";
./start.sh

After deployment, ARMS dashboards display a waterfall view where each span represents a logical step (handshake, client send, server ASR, LLM inference, TTS, etc.). The WebSocket span also shows the streaming metrics (first‑chunk latency, average chunk interval), helping operators pinpoint performance bottlenecks in real‑time AI pipelines.

Key take‑aways

WebSocket’s lack of per‑frame headers requires explicit context injection during the HTTP handshake and optional custom header fields for subsequent messages.

Choose a span granularity that matches the business scenario: connection‑level for short sessions, session‑ID linked traces for long‑lived connections.

When messages trigger asynchronous processing, capture and restore the OpenTelemetry Context manually or rely on LoongSuite’s automatic propagation for thread‑pool tasks.

Expose streaming‑specific metrics (first‑chunk latency, average interval) as span attributes to monitor real‑time AI workloads.

All code snippets above are directly taken from the source article and preserved in <pre><code> blocks to ensure reproducibility.

CloudNativeOpenTelemetryWebSocketTracingLoongSuite
Alibaba Cloud Observability
Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.