Backend Development 13 min read

Design and Implementation of a Business System Trace and Log Reporting Tool

This article presents the challenges of complex business systems, compares distributed tracing and traditional ELK solutions, and details the design, integration steps, usage workflow, and future enhancements of a lightweight SDK-based trace and log reporting platform that improves debugging efficiency and reduces operational overhead.

Dada Group Technology

Oct 24, 2022

Design and Implementation of a Business System Trace and Log Reporting Tool

Business System Challenges

The storefront guide channel page system relies on multiple middle‑platform capabilities, leading to increasingly complex business logic and rapid growth in system complexity; as traffic rises, the time required for developers to operate and troubleshoot also increases, making fast problem restoration a key challenge.

Accurate business data tracing and rapid issue investigation become critical, requiring tools that record the entire execution process to reconstruct the first‑hand scene and enable precise analysis and localization.

Horizontal Product Research

Two mainstream approaches for business tracing are examined: distributed tracing systems (e.g., SkyWalking, Pinpoint) and log‑based ELK solutions.

The following outlines their usage scenarios and drawbacks.

2.1 Distributed Tracing Systems

The core principle links calls across servers using a common TraceId; a sample call chain illustrates a user request flowing from Application A to B, C, then to D and E, forming a directed acyclic graph.

Distributed tracing collects logs at sampling rates, writes them to data files, and pipelines them to BigTable by TraceId, enabling full‑chain visibility across services. However, it suffers from large log volumes and high maintenance costs, making it difficult to scale and adapt to evolving business needs.

2.2 Traditional ELK Log System

ELK requires developers to log extensively, then filter logs in Elasticsearch to reconstruct execution scenes. Its drawbacks include complex environment setup, cumbersome log collection, difficulty filtering overlapping logs, and time‑consuming manual analysis.

Both approaches have limitations, prompting the design of a hybrid solution that combines their strengths, uses timestamps and unique link identifiers for precise filtering, and leverages business attributes for accurate data selection.

Design Philosophy

High System Stability : Independent thread pool with discard‑when‑full policy; asynchronous reporting via MQ.

Low Integration Cost : SDK package with thread pool and messaging; annotation, AOP, and manual reporting options.

Traceability : Integration with JD pfinder for full‑link identification.

Visualization & Data Isolation : Micro‑application capability for isolated visual dashboards and customized pages.

Instant Notification : JD me instant messaging integrated with micro‑application user settings.

Overall architecture diagram:

The design enables precise data reporting, reduces integration overhead via an independent SDK, and supports configurable, low‑intrusion deployment.

Usage Workflow

The end‑to‑end process consists of five steps, with blue parts indicating business‑system actions and red parts indicating micro‑application capabilities.

Step 1: Create a micro‑application and obtain a unique agentId.

(1) Application creation for data isolation and visual page setup.

(2) Retrieve agentId for data isolation.

Step 2: Add the reporting SDK.

Insert Maven dependency into pom.xml and set tools.version to 2.0.0‑SNASHOT.

<dependency>
    <groupId>com.jd.tools.log</groupId>
    <artifactId>tools-api</artifactId>
    <version>${tools.version}</version>
</dependency>

Step 3: Configure basic settings.

Set the system code (agentId) in *.properties or *.yml files.

lc.systemCode=${agentId}

lc:
  systemCode: ${agentId}

Step 4: Report business data using one of three methods.

Reporting Method

Advantages

Disadvantages

Use Cases

Annotation

Flexible, no code, simple

Manual on key methods, not for private methods, single format

New systems, fine‑grained method splitting

API

Flexible, controllable format, high portability

More code, invasive

Both new and legacy systems, need to report private methods

AOP

Path interception, simple config, wide coverage

Private methods unsupported, single format, cannot identify key points

Full‑scope data reporting without focusing on specific points

Combining methods can improve data quality.

Step 5: View reports in the micro‑application.

Reports contain custom system fields, business fields (account, channelOrigin, className, createdDate, methodName, module), systemCode, traceId, and key business data.

Common Questions

1. Does integration significantly impact performance? – The solution uses an independent thread pool and MQ with a discard‑when‑full policy to minimize impact.

2. How to quickly locate a problematic link for a user in a specific time window? – Use time + user PIN to filter data, then traceId to reconstruct the full call chain.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java Microservices Distributed Tracing sdk integration backend-monitoring log reporting

Written by

Dada Group Technology

Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.