Can iLogtail Replace Logstash? A Deep Dive into Performance and Architecture
This article examines the traditional ELK stack, compares iLogtail with Filebeat and Logstash in real‑world performance tests, analyzes why iLogtail could not previously replace Logstash, and presents five concrete engineering solutions that enable iLogtail to become a viable, high‑performance alternative for log collection and processing.
Traditional ELK Architecture
In the classic ELK stack, ElasticSearch (E) stores logs, Logstash (L) processes and transforms them, and Kibana (K) visualizes the data. A typical pipeline uses Filebeat to ship logs to Kafka , then Logstash consumes, enriches, and forwards them to ElasticSearch for Kibana dashboards.
Why iLogtail Is Worth Considering
iLogtail is a lightweight, high‑performance log collector whose benchmark tests show significantly lower CPU usage than Filebeat. The advantage stems mainly from its Polling + inotify mechanism, which the community documents in detail.
Performance tests (see the benchmark table) demonstrate that under the “container file collection with multiple configurations” scenario, iLogtail’s CPU increase is roughly half of Filebeat’s. In other scenarios, iLogtail’s CPU consumption is 5‑10 times lower.
Feasibility of Replacing Logstash with iLogtail
Historically, iLogtail could not directly replace Filebeat and Logstash for three main reasons:
Plugin performance limitations (e.g., the original ElasticSearch flusher plugin).
Lack of a configuration‑management UI for administrators.
Insufficient disaster‑recovery capabilities.
Missing lifecycle management and health‑monitoring for iLogtail agents.
Detailed Analysis
Plugin Performance : The core of iLogtail is fast, but the ElasticSearch flusher plugin was a bottleneck.
Configuration Management : Production environments have many collector instances but no front‑end for managing configurations. The existing Config Server API can be leveraged to build a simple UI.
Disaster Recovery : Config Server currently uses a stateful LevelDB store, which is unsuitable for multi‑instance deployments. Switching to a mature relational database (MySQL, PostgreSQL, etc.) is required.
Agent Health Monitoring : Agents lack proper heartbeat handling and status reporting, making it hard to detect offline agents or clean up stale records.
Solution Overview (KR1‑KR5)
KR1 – Eliminate ElasticSearch Flusher Bottleneck
Use esapi.BulkRequest to batch hundreds‑thousands of log entries per request, reducing request count by orders of magnitude.
Generate a random pack_id as a routing key so that a batch of logs is routed to the same shard, avoiding costly auto‑generated document IDs.
Enable a Go routine pool to send logs concurrently.
KR2 – Agent Lifecycle Management & Health Detection
Adopt a heartbeat‑based approach similar to HAProxy:
Config Server marks an agent as online only after receiving a configurable number of consecutive heartbeats.
If heartbeats are missing for the same number of intervals, the agent is considered offline.
Offline agents are automatically cleaned up after a configurable timeout.
KR3 – Tag‑Based Agent Grouping
Unify the tag representation for AgentGroup and Agent to a name‑value array and add logical operators (“AND”, “OR”) to express group membership.
message AgentGroupTag {
string name = 1;
string value = 2;
}
message AgentGroup {
string group_name = 1;
...
repeated AgentGroupTag tags = 3;
}
message Agent {
string agent_id = 1;
...
repeated string tags = 4;
...
}Example: a group with tags cluster:A OR cluster:B matches any agent with either tag; a group with cluster:A AND group:B matches only agents that have both.
KR4 – Config Server Disaster Recovery
Implement the provided Database interface to replace LevelDB with MySQL/PostgreSQL/SQLite/SQLServer via GORM. All CRUD operations are delegated to GORM, enabling reliable persistence.
type Database interface {
Connect() error
GetMode() string // store mode
Close() error
Get(table string, entityKey string) (interface{}, error)
Add(table string, entityKey string, entity interface{}) error
Update(table string, entityKey string, entity interface{}) error
Has(table string, entityKey string) (bool, error)
Delete(table string, entityKey string) error
GetAll(table string) ([]interface{}, error)
GetWithField(table string, pairs ...interface{}) ([]interface{}, error)
Count(table string) (int, error)
WriteBatch(batch *Batch) error
}Note: Using WriteBatch with large transactions can cause deadlocks; the current implementation prefers individual writes, which are sufficient for the observed ~100 TPS.
KR5 – Agent Self‑Monitoring
Agents report CPU usage and memory consumption in the heartbeat’s extras field. Config Server stores these metrics in MySQL, and a scheduled job snapshots them for time‑series visualization, aiding troubleshooting.
Log Query Service
While Kibana offers generic visualization, it lacks enterprise‑grade features such as multi‑tenant role control, log‑library configuration, and multi‑cluster management. A minimal custom query UI is required to complete the end‑to‑end solution, though this part is outside the scope of iLogtail itself.
Practical Outcomes
After addressing the five key problems, iLogtail forms a closed loop for configuration, collection, and feedback, making it a strong replacement for Filebeat and Logstash. The production deployment runs iLogtail 1.8.0, has been stable for over three months, processes billions of log entries per hour, and stores terabytes of data across hundreds of ElasticSearch nodes.
Ongoing work includes finalizing protocol changes and upstreaming them to the iLogtail main branch.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
