Root Cause Analysis of Missing Trace Data in Go Services Using Prometheus Metrics and GZIP Compression
The missing trace data in two Go services was caused by the GoFrame tracing middleware recording the gzip‑compressed /metrics response body as a UTF‑8 string, which the OpenTelemetry exporter rejected as invalid UTF‑8; disabling Prometheus compression or decompressing the body before logging resolves the issue.
Background: In a cloud‑native environment, developers often face issues such as page timeouts, difficulty identifying slow services, and locating error logs across multiple pods. Traditional monitoring cannot meet these needs, so observability based on metrics, tracing, and logging is introduced.
The three pillars of observability are described: metrics (quantitative counters, gauges, histograms, etc.), tracing (request lifecycle across distributed systems), and logging (fine‑grained event records).
Problem description (Section 02): After enabling trace and log functions, only the tcf‑app‑backend‑pro service reported trace data correctly. The other two services ( tcf‑server‑demo and tcf‑app‑backend‑test ) showed no trace data. Logs revealed repeated warnings from trace.otlpExporter indicating a failure in the ExportSpans method.
Initial log analysis pointed to a warning in the Go vendor code where grpc: error while marshaling occurred during message encoding. The problematic msg contained a mixture of readable strings and hexadecimal data, suggesting an unexpected format.
Further investigation identified the attribute http.response.body in the GoFrame server trace middleware. The value of this attribute was the raw response body of the /metrics endpoint, which appeared as garbled data.
Source code analysis (Snippet 1) showed the middleware adding the response body to the trace event:
const (
tracingEventHttpResponseBody = "http.response.body"
tracingMiddlewareHandled gctx.StrKey = `MiddlewareServerTracingHandled`
)
// internalMiddlewareServerTracing enables tracing.
func internalMiddlewareServerTracing(r *Request) {
// ...
r.Middleware.Next()
// Error logging.
if err := r.GetError(); err != nil {
span.SetStatus(codes.Error, fmt.Sprintf(`%+v`, err))
}
// Response content logging.
var resBodyContent = gstr.StrLimit(r.Response.BufferString(), gtrace.MaxContentLogSize(), "...")
span.AddEvent(tracingEventHttpResponse, trace.WithAttributes(
attribute.String(tracingEventHttpResponseHeaders, gconv.String(httputil.HeaderToMap(r.Response.Header()))),
attribute.String(tracingEventHttpResponseBody, resBodyContent),
))
}Investigation of the Prometheus client‑go handler revealed that when the request header contains gzip and compression is not disabled, the response body is compressed before being sent:
if !opts.DisableCompression && gzipAccepted(req.Header) {
header.Set(contentEncodingHeader, "gzip")
gz := gzipPool.Get().(*gzip.Writer)
defer gzipPool.Put(gz)
gz.Reset(w)
defer gz.Close()
w = gz
}
enc := expfmt.NewEncoder(w, contentType)The compressed response (gzip) is then stored in the http.response.body attribute, which is not valid UTF‑8. When the trace data is later marshaled into protobuf for gRPC transmission, the OpenTelemetry exporter rejects the non‑UTF‑8 payload, resulting in the “invalid UTF‑8” error.
Two remediation approaches were proposed:
Submit an issue/PR to the GoFrame project to make the trace middleware detect Content‑Encoding: gzip and decompress the body before logging.
Disable compression for the /metrics endpoint by setting DisableCompression: true in promhttp.HandlerOpts .
Implementation of the second approach (Snippet 2) is straightforward:
s.BindHandler(config.Uri, func(r *ghttp.Request) {
promhttp.InstrumentMetricHandler(
config.Registerer,
promhttp.HandlerFor(config.Gatherer, promhttp.HandlerOpts{DisableCompression: true}),
).ServeHTTP(r.Response.Writer, r.Request)
})After redeploying with compression disabled, trace data from all services appeared correctly in the SLS console.
Additional verification using a custom gRPC demo (Snippets 3 and 4) reproduced the issue: the server returned a gzip‑compressed string as the protobuf Message field, the client received it, and the OpenTelemetry exporter again failed with “invalid UTF‑8”.
// server.go (excerpt)
func GZIPEn(str string) []byte {
var b bytes.Buffer
gz := gzip.NewWriter(&b)
if _, err := gz.Write([]byte(str)); err != nil { panic(err) }
if err := gz.Flush(); err != nil { panic(err) }
if err := gz.Close(); err != nil { panic(err) }
return b.Bytes()
}
func (h *greeterServer) SayHello(ctx context.Context, r *pb.HelloRequest) (*pb.HelloReply, error) {
v := fmt.Sprintf("hello, %s", r.GetName())
vv := GZIPEn(v)
return &pb.HelloReply{Message: string(vv)}, nil
} // client.go (excerpt)
for {
hello, err := NewGreeterClient(c).SayHello(ctx, &pb.HelloRequest{Name: "test"})
if err != nil { gt.Log().Errorf(ctx, "client say hello fail, err: %+v", err); return }
gt.Log().Printf(ctx, "%v", hello)
time.Sleep(time.Second)
}Conclusion (Section 06): The root cause is that Prometheus’ /metrics endpoint returns gzip‑compressed data, which the GoFrame trace middleware records verbatim. The compressed bytes are not valid UTF‑8, causing the OpenTelemetry exporter to reject the span. The fix is to either decompress the body in the middleware or disable compression for the metrics endpoint.
Key take‑aways: when integrating observability components, ensure that response bodies logged to trace attributes are UTF‑8 compliant; use compression flags wisely; and leverage targeted logging to isolate problematic data.
37 Interactive Technology Team
37 Interactive Technology Center
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.