Operations 5 min read

Design and Implementation of a Real-Time Log Collection and Query System for Distributed Deployment

The article describes the challenges of troubleshooting distributed deployments across many machines and presents a solution built on the ELK stack that centralizes logs from Java and Go services, enabling near‑real‑time search, visualization, and faster issue resolution.

转转QA
转转QA
转转QA
Design and Implementation of a Real-Time Log Collection and Query System for Distributed Deployment

When diagnosing problems in distributed systems, engineers often have to log into multiple machines to view logs, which is time‑consuming and error‑prone, especially for long call chains.

For the Zz release system, the existing tracing tool only supported Java services, while the deployment pipeline also involved Go‑based agents, and the sheer number of servers (thousands) made per‑machine log access cumbersome due to permission and login overhead.

The deployment process consists of seven steps—backup, download, unpack, environment init, nginx config removal, service start, and nginx re‑enable—each requiring commands sent from the release service to the agent, which then executes scripts on target machines and reports results back.

To address these issues, the team evaluated commercial and open‑source log solutions and built a near‑real‑time log system centered on Elasticsearch (the "E" of the ELK stack). Java services push logs via the Elasticsearch client, while Go agents report via HTTP endpoints, and Kibana provides a searchable UI for the entire deployment workflow.

In practice, the index mapping is defined (e.g., GET zzdeploylog/_mapping) and logs can be queried by a unique task ID to reconstruct the full deployment sequence, showing command parameters and outcomes for each step, thus eliminating the need to manually log into each host.

The new system has significantly improved troubleshooting efficiency; future plans include ingesting Docker container logs, enhancing Elasticsearch cluster stability, and extending the solution to other business services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsmonitoringOperationslog collection
转转QA
Written by

转转QA

In the era of knowledge sharing, discover 转转QA from a new perspective.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.