Why iLogtail Needed a Complete Architecture Overhaul and How It Was Done
This article explains the evolution of iLogtail from a single‑file collector to a multi‑language, plugin‑based observability pipeline, outlines the motivations for refactoring, describes the new unified data model, plugin abstractions, pipeline design, configuration management, hot‑reload mechanisms, and the separation of enterprise and open‑source code, providing a comprehensive view of the architectural upgrade.
Overview
iLogtail is a high‑performance, lightweight observability data collector provided by Alibaba Cloud SLS. It runs on servers, containers, and embedded environments, serving Alibaba Group, Ant Group, and many cloud customers. It processes tens of petabytes of data daily for monitoring, troubleshooting, operational analysis, and security.
Architecture Evolution
Since 2013 iLogtail started as the core log collection component of Alibaba's Feitian 5K project. Over a decade it has undergone several architectural iterations as cloud‑native and observability concepts spread.
Single File Collection Stage
Only collects log files.
Assumes a single log format per file, with one processing method (e.g., regex or JSON).
Can only send logs to the SLS backend.
Implementation is fully in C++ with a monolithic, procedure‑oriented design, heavy class inter‑dependencies, and poor extensibility.
Golang Plugin Extension Stage
To meet broader observability needs, a Golang plugin system was added, providing:
Multiple independent pipelines.
Support for multiple inputs and outputs.
Plugin chaining for enhanced processing.
Hot‑loadable configuration independent of the C++ core.
However, the combination of C++ and Golang modules remained limited, and many data paths could not mix native and extension plugins.
Why Refactor?
The original architecture limited combination of input, processing, and output modules, hurting performance for complex log scenarios and making open‑source usage difficult. Specific problems include:
Complex class dependencies in the C++ core make development hard.
Data structures (LogGroup) only support logs, not metrics or traces, requiring extra conversion for third‑party storage.
File‑level code replacement between commercial and open‑source versions introduces inconsistencies.
Therefore a comprehensive redesign was deemed necessary.
Goals of the Refactor
Replace the internal data model with a generic one to avoid unnecessary format conversion.
Make all input, processing, and output capabilities plugin‑based, unifying C++ and Golang plugins.
Introduce a pipeline concept in the C++ core to match the Golang system.
Support hot‑loading of configurations and improve configuration file organization.
Separate commercial code from open‑source code cleanly.
Practice
Generalized Data Model
The old LogGroup protobuf is replaced by a universal PipelineEventGroup that can carry logs, metrics, and traces. It contains mEvents, shared mMetadata, mTags, and a memory allocator mSourceBuffer.
PipelineEvent Hierarchy
PipelineEventis an abstract base class with subclasses LogEvent, MetricEvent, and SpanEvent. Events must be created via the owning PipelineEventGroup using AddLogEvent, AddMetricEvent, or AddSpanEvent.
Plugin Abstractions
All plugins inherit from an abstract Plugin class that provides a Name() method and a context pointer. Specific plugin types:
Input :
class Input : public Plugin { public: virtual ~Input() = default; virtual bool Init(const Json::Value& config, Json::Value& optionalGoPipeline) = 0; virtual bool Start() = 0; virtual bool Stop(bool isPipelineRemoving) = 0; };Processor :
class Processor : public Plugin { public: virtual ~Processor() {} virtual bool Init(const Json::Value& config) = 0; virtual void Process(std::vector<PipelineEventGroup>& logGroupList); protected: virtual bool IsSupportedEvent(const PipelineEventPtr& e) const = 0; virtual void Process(PipelineEventGroup& logGroup) = 0; };Flusher :
class Flusher : public Plugin { public: virtual ~Flusher() = default; virtual bool Init(const Json::Value& config, Json::Value& optionalGoPipeline) = 0; virtual bool Start() = 0; virtual bool Stop(bool isPipelineRemoving) = 0; };Native Processing Plugins
Examples include: ProcessorSplitLogStringNative – split log blocks by delimiter. ProcessorSplitRegexNative – split by regex. ProcessorParseRegexNative – extract fields via regex. ProcessorParseJsonNative – parse JSON fields. ProcessorParseDelimiterNative – parse delimited fields. ProcessorParseTimestampNative – parse timestamps. ProcessorFilterRegexNative – filter events. ProcessorDesensitizeNative – mask sensitive data. ProcessorTagNative – move metadata to tags.
Input Plugin – File Input
The file input plugin registers its configuration with a global FileServer that runs a single thread to poll all files (bus‑mode). The plugin’s Start() and Stop() merely register/unregister with FileServer and start/stop the global thread when needed.
bool InputFile::Start() { if (!FileServer::GetInstance()->IsRunning()) { FileServer::GetInstance()->Start(); } /* register configs */ return true; } bool InputFile::Stop(bool isPipelineRemoving) { if (!FileServer::GetInstance()->IsPaused()) { FileServer::GetInstance()->Pause(); } /* unregister configs */ return true; }Output Plugin – SLS Flusher
The SLS flusher registers reference counts with a global SLSSender but does not control the sending thread directly.
bool FlusherSLS::Start() { SLSSender::Instance()->IncreaseProjectReferenceCnt(mProject); /* other refs */ return true; } bool FlusherSLS::Stop(bool isPipelineRemoving) { SLSSender::Instance()->DecreaseProjectReferenceCnt(mProject); /* other refs */ return true; }Pipeline Definition
The new Pipeline class holds vectors of InputInstance, ProcessorInstance, and FlusherInstance, a processing queue, and context. Its lifecycle methods Init(), Start(), Process(), and Stop() orchestrate the flow from inputs through processors to flushers.
class Pipeline { public: bool Init(Config&& config); void Start(); void Process(std::vector<PipelineEventGroup>& logGroupList); void Stop(bool isRemoving); private: std::string mName; std::vector<std::unique_ptr<InputInstance>> mInputs; std::vector<std::unique_ptr<ProcessorInstance>> mProcessorLine; std::vector<std::unique_ptr<FlusherInstance>> mFlushers; FeedbackQueue<PipelineEventGroup> mProcessQueue; PipelineContext mContext; /* other members */ };Configuration Management
Configuration Formats
Two legacy formats exist:
Commercial JSON flat config (single file may contain many configs).
Open‑source YAML pipeline config.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
