Design and Implementation of a Log Collection Agent: Challenges and Solutions
This article explains the evolution of logging, the role of log‑collection agents, industry solutions, and step‑by‑step techniques for building a reliable push‑mode log collector on Linux, covering file discovery, offset management, file identification, update detection, and safe resource release.
Logging has shifted from human‑oriented text to machine‑processed data, making log‑collection agents essential for decoupling storage and analysis; agents push logs to subscription‑enabled stores such as Kafka, DataHub, or LogHub.
The industry currently favors tools like Fluentd, Logstash, Flume, and Alibaba's LogAgent/LogTail, with Fluentd promoting a Unified Logging Layer to reduce format conversion complexity.
How to discover a file? A simple approach lists files in a config, but dynamic log creation requires pattern matching (e.g., /var/www/log with filenames like access.log or access.log-2018-01-10 ) using glob or regex such as access.log(-[0-9]{4}-[0-9]{2}-[0-9]{2})? . Inotify can monitor new files, though it lacks recursive support and may miss events; combining Inotify with periodic polling yields both timeliness and completeness.
Offset file high availability is achieved by writing to a temporary file ( offset.bak ), calling fdatasync , then atomically renaming it to offset . This guarantees a valid offset even after crashes.
How to identify a file? Relying on filenames is fragile; using dev + inode improves reliability, but inode reuse after deletion can cause mis‑identification. Storing a unique identifier via extended attributes (xattr) or a file’s initial bytes can further differentiate files, though not all filesystems support xattr.
Detecting file updates can be done with Inotify events, but high‑frequency writes may overflow the event queue. Simple polling of file stat information is a universal fallback.
Safely releasing file handles mirrors Fluentd’s strategy: configure a grace period after deletion before closing the descriptor. Tools like lsof can inspect reference counts, but kernel‑level APIs would be more efficient.
In summary, building a robust log‑collection agent involves handling file discovery, offset persistence, unique file identification, update detection, and graceful resource cleanup, all of which require deep knowledge of Linux file systems and system calls.
References include articles on Inode, Inotify, xattr, and Fluentd’s Unified Logging Layer.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.