Operations 6 min read

Building a Near Real‑Time Log Collection and Query System for Distributed Deployment

The article describes how a distributed deployment platform built a centralized Elasticsearch‑based log collection and query system to replace manual multi‑machine log inspection, detailing the background challenges, architecture, implementation steps, practical usage, and future improvements.

转转QA

Jul 23, 2020

Building a Near Real‑Time Log Collection and Query System for Distributed Deployment

Background

Diagnosing problems in distributed systems traditionally relies on manually checking logs on many machines, which is time‑consuming because services span numerous hosts and long call chains. The authors needed a unified log collection and query platform to streamline issue location.

Motivation for the Zz Deployment System

The existing tracing solution only supports Java services, while the deployment platform includes Java backend services and Go‑based deployment agents, making full‑chain log collection difficult.

Deployment spans dozens of servers among thousands managed by the company; accessing logs on each host requires separate permissions and logins, creating a cumbersome workflow.

The article uses the Zz deployment system’s service rollout process as a case study, illustrating the complexity of troubleshooting and the adopted solution.

Deployment Process Example

The Spring Boot service rollout consists of seven steps: backup old version, download new package, unzip, initialize environment, disable Nginx on target machines, deploy and start the new service, and re‑enable Nginx. Each step requires the deployment service to issue commands to the agent, which runs scripts on the target machine and reports results. If any step fails, engineers must log into both the deployment service and the target machine to inspect logs.

Log System Construction and Integration

After evaluating industry solutions, the team built a near real‑time log system centered on Elasticsearch (the "E" in the ELK stack). Elasticsearch provides scalable, distributed, full‑text search and analytics on top of Apache Lucene, with a flexible query DSL and RESTful API. Java deployment services push logs via the Elasticsearch client, while Go agents send logs through an HTTP endpoint. Kibana is used for visualizing and searching the collected logs.

Practical Application

The index mapping and key fields are shown below:

GET zzdeploylog/_mapping

Using a unique task ID, users can query the entire deployment workflow, viewing command parameters, execution results, and timestamps for each step, thereby eliminating the need to log into multiple machines.

Conclusion

The new log system has significantly improved troubleshooting efficiency for the deployment platform. Future work includes ingesting Docker container logs, enhancing Elasticsearch cluster stability, automating index lifecycle management, and extending the solution to other business systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Elasticsearch log management Kibana

Written by

转转QA

In the era of knowledge sharing, discover 转转QA from a new perspective.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.