Operations 17 min read

Unified Metrics, Tracing, and Logging: A Financial Firm’s Path to Microservice Observability

Facing the challenges of distributed microservice architectures, a financial services company implemented a unified observability platform that combines metrics, tracing, and logging via OpenTelemetry and custom agents, enabling real‑time visualization, anomaly detection, and performance analysis across seven core business middle‑platforms.

dbaplus Community
dbaplus Community
dbaplus Community
Unified Metrics, Tracing, and Logging: A Financial Firm’s Path to Microservice Observability

Introduction

Microservice architectures are increasingly adopted across industries for their lightweight, agile, and maintainable characteristics. However, the distributed nature of microservices creates observability challenges for developers, testers, operators, and business analysts. Traditional monitoring methods no longer suffice.

Background

In June 2019, Oriental Securities released the gRPC‑Nebula service‑governance framework and announced a "big middle‑platform" strategy. To support rapid business innovation, the company reorganized its wealth‑management domain into seven core middle‑platforms (account, product, sales, asset, transaction summary, market data, and information), all built on gRPC‑Nebula and accessed via a service‑governance platform.

Observability Challenges

Developers must trace end‑to‑end request topologies across a web‑like service call graph.

Testers need to reconstruct request flows from logs across multiple nodes.

Operators must pinpoint faulty nodes and measure latency per service and interface.

Business analysts require consolidated data from multiple platforms for accurate reporting.

Key Concepts

Observability

Originating from control theory, observability measures how well internal system states can be inferred from external outputs. In distributed systems, both component‑level outputs (logging, metrics) and inter‑component flows (tracing) are required.

Three Pillars of Observability

Metrics Data : Counters, gauges, histograms, summaries.

Logging Data : Fine‑grained events, variables, request/response records.

Tracing Data : Distributed request lifecycles represented by trace IDs and spans.

These pillars complement each other: metrics trigger alerts, tracing locates the problematic module, and logs reveal the root cause.

OpenTelemetry

OpenTelemetry, launched in 2019 by merging OpenTracing and OpenCensus, provides a standardized data model, SDKs, and exporters for traces, metrics, and logs. It recommends Prometheus for metrics storage and Jaeger for tracing; logging standardization is still evolving.

Proposed Observability Solution

The solution, named the Oriental Securities Observability Platform, integrates logging, metrics, and tracing into a single pipeline.

Technical Architecture

The platform consists of three layers (see Image 3):

Data‑collection Agent : Captures logs and trace data in real time, assigns a common traceID, and publishes to Kafka topics Logging and Tracing.

Data‑Processing Module : Consumes Kafka streams, stores raw logs and traces in Elasticsearch, aggregates statistics, and writes results to MySQL.

Data‑Visualization Module : Presents correlated logs, traces, and metrics through dashboards.

Observability platform architecture
Observability platform architecture

Key Techniques

TraceID Generation & MDC : UUID‑based traceIDs ensure uniqueness. The traceID is stored in the Mapped Diagnostic Context (MDC) so that all logs and spans generated in the same thread hierarchy share the same identifier.

Log Format & Collection : Logs follow a unified pattern timestamp [LEVEL]: message, with timestamps in yyyy‑MM‑dd HH:mm:ss SSS format and JSON‑encoded message bodies. Custom LogbackAppender and Log4j2Appender with filters and converters forward logs to Kafka.

Span Model & Propagation : Each request creates a traceID; spans carry spanID and parentSpanID (pSpanID). Trace context is propagated via gRPC HTTP headers, enabling end‑to‑end span reconstruction across services.

Span format
Span format

Metrics Model

Metrics are divided into system and business categories, collected daily and historically. Business‑specific metrics for wealth‑sales (see Image 4) are stored in MySQL for historical analysis and visualized via Grafana.

Business metrics model
Business metrics model

Implementation Effects

Distributed Call‑Chain Visualization

The platform renders request‑level call‑chain trees with service name, method, latency, status code, and request/response payloads. Visualization reduced test‑execution tracing time by 90%.

Distributed call‑chain visualization
Distributed call‑chain visualization

Anomaly Detection & Diagnosis

When an error event triggers an alarm, the platform locates the offending log via traceID, displays the problematic span, and shows its input/output parameters, cutting diagnosis time by roughly 90%.

Anomaly detection workflow
Anomaly detection workflow

Metric‑Trace Correlation

Daily service‑level call‑volume metrics link to the list of requests for a specific interface, which in turn opens the corresponding call‑chain tree. This enables pinpointing high‑latency spans for performance tuning and correlating error logs with metric spikes.

Metric and trace correlation
Metric and trace correlation

Real‑Time Dashboard

Grafana dashboards display both system and business metrics for the wealth‑sales domain, providing an at‑a‑glance view of service health and business performance.

Grafana wealth‑sales dashboard
Grafana wealth‑sales dashboard

Conclusion

The presented solution addresses the observability gaps inherent in distributed microservice architectures by fusing metrics, tracing, and logging with low intrusion through SDK integration in the gRPC‑Nebula framework. It provides developers with topology insights, testers with faster execution tracing, operators with precise fault isolation, and business users with customizable reports. The approach is broadly applicable to other enterprises facing similar observability challenges.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MicroservicesObservabilitymetricsOpenTelemetryloggingDistributed Tracing
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.