Performance Optimization and Architecture of iLogTail for High‑Scale Log Collection
Didi replaced its legacy agent with Alibaba’s open‑source iLogTail, re‑architected it to use a shared thread‑pool and SIMD‑accelerated parsing, rewrote critical plugins in C++ and added robust Kafka retry logic, achieving over twice the throughput while cutting CPU usage by more than half and maintaining near‑zero latency at massive scale.
Background: Didi previously built an in‑house log‑collection agent that processed several petabytes of logs per day. With the introduction of AMD machines with many cores, the original agent could no longer meet the throughput requirements, prompting a migration to the open‑source iLogTail project and extensive internal optimizations.
iLogTail Overview: iLogTail is an open‑source collector released by Alibaba Cloud SLS in June 2022. It is widely used inside Alibaba and by tens of thousands of external customers, with an installation base approaching ten million and daily ingestion of tens of petabytes of observability data. Its lightweight, high‑performance, low‑latency design gives it a clear advantage over competing agents in terms of resource consumption, collection speed, and latency.
Architecture: iLogTail follows a centralized architecture where input, processing, and output modules of each collection task share a common thread pool. Tasks exchange data via buffered queues, which reduces per‑task thread overhead compared with a monolithic design. This shared‑resource model requires explicit task scheduling and resource management but yields significant CPU and memory savings.
Key Advantages:
Efficient Collection: Uses Linux inotify to detect file changes instantly, achieving lower latency and lower CPU cost.
High Performance: The collection and Kafka‑sending components are written in C++, while plugins (e.g., Kafka sender) are implemented in Go. The Go plugins introduce GC overhead, which is later eliminated.
Performance Optimizations:
Multiline Parsing: Replaced regex‑based multiline detection with a timestamp‑based algorithm. If a line cannot be parsed as a timestamp, it is merged with the previous line, reducing character comparisons and doubling parsing speed.
Vectorized Line‑break Detection: Leveraged AVX2 instructions to process 32 characters per instruction, cutting the time spent in the GetNextLine routine and improving overall CPU performance by ~8%.
C++ Rewrite of FlusherKafka Plugin: Reimplemented the Kafka flushing logic entirely in C++ to eliminate Go GC pauses and avoid the overhead of serializing C++ log objects to Go before sending.
High‑Performance JSON Library: Integrated the open‑source sonic‑cpp library, which uses SIMD to accelerate escape‑character handling and supports std::string_view to avoid unnecessary memory copies.
Kafka Failure Handling: Added offset rollback and exponential back‑off retry mechanisms. When a send fails, the smallest offset of the batch is recorded and rolled back, and a TCP‑like congestion control throttles the collector until the failure resolves.
Benchmark Results:
In small‑traffic scenarios, the internally optimized iLogTail version saves 56% CPU compared with the upstream version. In large‑traffic tests on a 384‑core, 1.5 TB RAM server running 600 concurrent collection tasks, the optimized version processes >420 MB/s with less than 5 CPU cores and keeps latency near zero. Overall, the internal version achieves more than a 2× throughput increase while cutting CPU usage by over half.
Conclusion: Through multi‑dimensional performance testing and iterative optimization, the internally tuned iLogTail delivers substantial gains in resource efficiency, stability, and scalability, making it a robust foundation for massive log‑collection pipelines.
Didi Tech
Official Didi technology account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.