Cloud Native 17 min read

How LoongCollector Doubled Log Collection Speed with Four Key Optimizations

This article details the architectural overhaul of iLogtail into LoongCollector, explains why generalization caused a 15% performance drop, and walks through four systematic optimizations—including memory arenas, eliminating shared_ptr, event pooling, and direct serialization—that ultimately restored and doubled log‑collection throughput.

Alibaba Cloud Developer

Mar 26, 2025

How LoongCollector Doubled Log Collection Speed with Four Key Optimizations

Introduction

To build a modern observability data collector (LoongCollector), iLogtail underwent a universal architecture upgrade aimed at high reliability, scalability, and performance. The upgrade inevitably introduced a ~15% performance regression, prompting a deep dive into optimization techniques.

Architecture Upgrade: Generalization Refactor

iLogtail originally combined a C++ core with Golang plugins, handling only log files. The new LoongCollector adopts a plugin‑based pipeline where each task is a configurable pipeline consisting of input, processor, and output plugins, supporting multiple data sources such as Prometheus and eBPF.

Pipeline Model

Each pipeline runs on three dedicated threads (Input Runner, Processor Runner, Flusher Runner) connected via buffered queues. Time‑slicing and per‑pipeline queues ensure fairness and isolation.

Data Model

The fundamental unit is an Event , with three concrete types: Log, Metric, and Span. A LogEvent is defined as:

class LogEvent : public PipelineEvent {</code><code>private:</code><code>    std::map<std::string, std::string> mContents;</code><code>};

Multiple events can be grouped into an Event Group :

class PipelineEventGroup {</code><code>private:</code><code>    std::map<std::string, std::string> mTags;</code><code>    std::vector<std::unique_ptr<PipelineEvent>> mEvents;</code><code>};

Performance Degradation: An Inevitable Trade‑off?

Benchmarking under a 1 GB/s log‑generation workload showed a 15% slowdown after the upgrade.

Breaking the Bottleneck: Performance Boost Secrets

Four optimization steps were applied:

Step 1 – Memory Arena

Replace heavy string copies with string_view and allocate all strings from a per‑group memory pool.

class PipelineEventGroup {</code><code>private:</code><code>    std::map<std::string_view, std::string_view> mTags;</code><code>    std::vector<std::unique_ptr<PipelineEvent>> mEvents;</code><code>    std::shared_ptr<SourceBuffer> mSourceBuffer;</code><code>};

Step 2 – Eliminate shared_ptr for Events

Bind each PipelineEvent to its owning group, removing per‑event shared_ptr overhead:

class LogEvent : public PipelineEvent {</code><code>private:</code><code>    std::map<std::string_view, std::string_view> mContents;</code><code>    PipelineEventGroup* mPipelineEventGroupPtr;</code><code>};

Step 3 – Event Pool

Introduce thread‑local event pools for Processor Runner threads and a shared pool for Input Runner threads, using lock‑free or double‑buffer strategies to reduce contention.

Step 4 – Direct Serialization

Bypass the intermediate Protobuf LogGroup object and serialize PipelineEventGroup directly to the wire format, cutting an extra memory copy.

Results

Combined, these optimizations restored performance and achieved a 100% improvement over the original iLogtail under high‑load scenarios.

Specialized Input Optimizations

For Prometheus scraping, the same principles (memory‑arena writes, streaming line‑by‑line processing, and bounded Event Groups) cut both CPU and memory usage, outperforming VMAgent in head‑to‑head tests.

Conclusion

The case study demonstrates that careful profiling, memory‑aware data structures, and targeted pooling can reconcile code generality with high performance, a lesson applicable to any cloud‑native observability pipeline.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Go C++data collector

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.