Cloud Native 43 min read

Why iLogtail Needed a Complete Architecture Overhaul and How It Was Done

This article explains the evolution of iLogtail from a single‑file collector to a multi‑language, plugin‑based observability pipeline, outlines the motivations for refactoring, describes the new unified data model, plugin abstractions, pipeline design, configuration management, hot‑reload mechanisms, and the separation of enterprise and open‑source code, providing a comprehensive view of the architectural upgrade.

Alibaba Cloud Observability
Alibaba Cloud Observability
Alibaba Cloud Observability
Why iLogtail Needed a Complete Architecture Overhaul and How It Was Done

Overview

iLogtail is a high‑performance, lightweight observability data collector provided by Alibaba Cloud SLS. It runs on servers, containers, and embedded environments, serving Alibaba Group, Ant Group, and many cloud customers. It processes tens of petabytes of data daily for monitoring, troubleshooting, operational analysis, and security.

Architecture Evolution

Since 2013 iLogtail started as the core log collection component of Alibaba's Feitian 5K project. Over a decade it has undergone several architectural iterations as cloud‑native and observability concepts spread.

Single File Collection Stage

Only collects log files.

Assumes a single log format per file, with one processing method (e.g., regex or JSON).

Can only send logs to the SLS backend.

Implementation is fully in C++ with a monolithic, procedure‑oriented design, heavy class inter‑dependencies, and poor extensibility.

Golang Plugin Extension Stage

To meet broader observability needs, a Golang plugin system was added, providing:

Multiple independent pipelines.

Support for multiple inputs and outputs.

Plugin chaining for enhanced processing.

Hot‑loadable configuration independent of the C++ core.

However, the combination of C++ and Golang modules remained limited, and many data paths could not mix native and extension plugins.

Why Refactor?

The original architecture limited combination of input, processing, and output modules, hurting performance for complex log scenarios and making open‑source usage difficult. Specific problems include:

Complex class dependencies in the C++ core make development hard.

Data structures (LogGroup) only support logs, not metrics or traces, requiring extra conversion for third‑party storage.

File‑level code replacement between commercial and open‑source versions introduces inconsistencies.

Therefore a comprehensive redesign was deemed necessary.

Goals of the Refactor

Replace the internal data model with a generic one to avoid unnecessary format conversion.

Make all input, processing, and output capabilities plugin‑based, unifying C++ and Golang plugins.

Introduce a pipeline concept in the C++ core to match the Golang system.

Support hot‑loading of configurations and improve configuration file organization.

Separate commercial code from open‑source code cleanly.

Practice

Generalized Data Model

The old LogGroup protobuf is replaced by a universal PipelineEventGroup that can carry logs, metrics, and traces. It contains mEvents, shared mMetadata, mTags, and a memory allocator mSourceBuffer.

PipelineEvent Hierarchy

PipelineEvent

is an abstract base class with subclasses LogEvent, MetricEvent, and SpanEvent. Events must be created via the owning PipelineEventGroup using AddLogEvent, AddMetricEvent, or AddSpanEvent.

Plugin Abstractions

All plugins inherit from an abstract Plugin class that provides a Name() method and a context pointer. Specific plugin types:

Input :

class Input : public Plugin { public: virtual ~Input() = default; virtual bool Init(const Json::Value& config, Json::Value& optionalGoPipeline) = 0; virtual bool Start() = 0; virtual bool Stop(bool isPipelineRemoving) = 0; };

Processor :

class Processor : public Plugin { public: virtual ~Processor() {} virtual bool Init(const Json::Value& config) = 0; virtual void Process(std::vector<PipelineEventGroup>& logGroupList); protected: virtual bool IsSupportedEvent(const PipelineEventPtr& e) const = 0; virtual void Process(PipelineEventGroup& logGroup) = 0; };

Flusher :

class Flusher : public Plugin { public: virtual ~Flusher() = default; virtual bool Init(const Json::Value& config, Json::Value& optionalGoPipeline) = 0; virtual bool Start() = 0; virtual bool Stop(bool isPipelineRemoving) = 0; };

Native Processing Plugins

Examples include: ProcessorSplitLogStringNative – split log blocks by delimiter. ProcessorSplitRegexNative – split by regex. ProcessorParseRegexNative – extract fields via regex. ProcessorParseJsonNative – parse JSON fields. ProcessorParseDelimiterNative – parse delimited fields. ProcessorParseTimestampNative – parse timestamps. ProcessorFilterRegexNative – filter events. ProcessorDesensitizeNative – mask sensitive data. ProcessorTagNative – move metadata to tags.

Input Plugin – File Input

The file input plugin registers its configuration with a global FileServer that runs a single thread to poll all files (bus‑mode). The plugin’s Start() and Stop() merely register/unregister with FileServer and start/stop the global thread when needed.

bool InputFile::Start() { if (!FileServer::GetInstance()->IsRunning()) { FileServer::GetInstance()->Start(); } /* register configs */ return true; } bool InputFile::Stop(bool isPipelineRemoving) { if (!FileServer::GetInstance()->IsPaused()) { FileServer::GetInstance()->Pause(); } /* unregister configs */ return true; }

Output Plugin – SLS Flusher

The SLS flusher registers reference counts with a global SLSSender but does not control the sending thread directly.

bool FlusherSLS::Start() { SLSSender::Instance()->IncreaseProjectReferenceCnt(mProject); /* other refs */ return true; } bool FlusherSLS::Stop(bool isPipelineRemoving) { SLSSender::Instance()->DecreaseProjectReferenceCnt(mProject); /* other refs */ return true; }

Pipeline Definition

The new Pipeline class holds vectors of InputInstance, ProcessorInstance, and FlusherInstance, a processing queue, and context. Its lifecycle methods Init(), Start(), Process(), and Stop() orchestrate the flow from inputs through processors to flushers.

class Pipeline { public: bool Init(Config&& config); void Start(); void Process(std::vector<PipelineEventGroup>& logGroupList); void Stop(bool isRemoving); private: std::string mName; std::vector<std::unique_ptr<InputInstance>> mInputs; std::vector<std::unique_ptr<ProcessorInstance>> mProcessorLine; std::vector<std::unique_ptr<FlusherInstance>> mFlushers; FeedbackQueue<PipelineEventGroup> mProcessQueue; PipelineContext mContext; /* other members */ };

Configuration Management

Configuration Formats

Two legacy formats exist:

Commercial JSON flat config (single file may contain many configs).

Open‑source YAML pipeline config.

cloud-nativeGolangobservabilityConfiguration ManagementC++iLogtailplugin-architecture
Alibaba Cloud Observability
Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.