Operations 16 min read

Loggie: A High-Performance Log Collection Agent System Design and Implementation

Loggie is a cloud-native, Go-based log-collection agent that replaces Filebeat and Flume by using a micro-kernel producer-consumer architecture with hot-swappable pipelines, achieving 2 GB/s read speeds, 1.6‑2.6× higher throughput while using only a quarter of the CPU, and providing built-in observability, reliability, and latency monitoring for large-scale enterprise deployments.

NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Loggie: A High-Performance Log Collection Agent System Design and Implementation

Loggie originated from the actual needs of NetEase Yanxuan business, grew through long-term co-construction with NetEase Data帆, and continues to develop through close collaboration with NetEase Media and Industrial and Commercial Bank of China. This article introduces the design and implementation of Loggie, a cloud-native log collection agent.

Background: In the early days of the Yanxuan log platform, Filebeat was used for cloud log collection while Flume was used for external cloud logs. This led to painful operational troubleshooting, with common questions including: why certain logs were not collected, why some were duplicated, whether missing logs could be re-collected, why certain files were not collected, why collection was slow (delay over 30s), and why logs disappeared after service redeployment. Additionally, maintaining two log collection agents resulted in high maintenance costs.

Problems with existing solutions: Both Filebeat and Flume had serious issues: low collection performance (Filebeat's limit ~80MB/s while peak log speed exceeded 100MB/s during promotions), high resource usage (Filebeat CPU exceeded 800% with 100+ files, Flume consumed 200MB+ memory at idle), and poor scalability due to complex architecture and single output design.

Architecture Design: Loggie is built on Golang with a micro-kernel design based on the classic producer-consumer pattern. Each pipeline consists of only four components: source, queue, sink, and interceptor (optional). The pipeline supports hot configuration reload and component hot-swapping, with strong isolation between pipelines to prevent mutual interference.

Log Collection Implementation:

Efficient Collection: Loggie combines OS event notification with timed polling (default 10s interval) to achieve both timely event detection and reliability. After analysis, only "inactive file write events" are truly important for timely detection.

File Reading: Uses reusable read buffer arrays and reads in multiples of 4KB to fully utilize OS page cache, achieving 2GB/s read performance on local SSD.

Fairness: Implements a "time slice" based reading method using a single goroutine to handle all log reading tasks while ensuring maximum fairness.

Reliability: Implements at-least-once delivery guarantee with ordered ACK and dual compression to reduce CPU consumption and disk I/O.

Performance Comparison: Compared with Filebeat under equal conditions, Loggie uses only about 1/4 of the CPU while achieving 1.6-2.6 times the throughput. Filebeat's throughput is capped at around 80MB/s, while Loggie can easily exceed 200MB/s with multiple files.

Operations and Governance:

Observability: Built-in metrics based on long-term operational experience to help quickly discover and locate problems, with Grafana dashboard templates available.

Completeness: Log integrity verification mechanism using machine IP + filename + inode + device + first N bytes as calculation dimensions.

Latency Monitoring: Calculates end-to-end latency across the entire链路 to identify resource bottlenecks and guide capacity planning.

Log Quality: Defines log quality scores based on field extraction, field existence, type conversion, and field constraint validation.

Analysis Applications: Business real-time monitoring and full-link monitoring applications that give logs strong business meaning and monitoring semantics.

monitoringperformance optimizationOperationsObservabilityGolog collectionlog agentPipeline Architecture
NetEase Yanxuan Technology Product Team
Written by

NetEase Yanxuan Technology Product Team

The NetEase Yanxuan Technology Product Team shares practical tech insights for the e‑commerce ecosystem. This official channel periodically publishes technical articles, team events, recruitment information, and more.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.