Can iLogtail Replace Logstash? Exploring Performance and Ops Challenges
This article examines the traditional ELK stack, highlights iLogtail's performance advantages over Filebeat and Logstash, analyzes why iLogtail could not previously replace them, and details the five key engineering solutions—ranging from plugin optimization to Config Server disaster recovery—that enable iLogtail to serve as a full‑stack log collection platform in cloud‑native environments.
Traditional ELK Solution
In the ELK stack, E stands for Elasticsearch, L for Logstash, and K for Kibana. Logstash provides a powerful data processing pipeline with complex transformations, filtering, and rich input/output support. Filebeat, a lightweight log collector, is often paired with Logstash for low‑resource, high‑volume log ingestion. The classic workflow sends logs from servers to Kafka via Filebeat, consumes them with Logstash for processing, stores them in Elasticsearch, and visualizes them with Kibana.
Advantages of iLogtail
iLogtail is also a lightweight, high‑performance data collector that outperforms Filebeat in benchmark tests. Its superior performance is largely attributed to a Polling + inotify mechanism. Detailed documentation is available in the community.
Performance tests show that under the "container file collection with multiple configurations" scenario, iLogtail’s CPU increase is roughly half of Filebeat’s, with other scenarios showing 5‑ to 10‑fold CPU advantages.
Reference: Container Scenario iLogtail vs Filebeat Performance Comparison
Can iLogtail Replace Logstash? Feasibility Analysis
Historically, iLogtail could not replace Filebeat and Logstash for four main reasons:
Plugin performance.
Configuration management.
Disaster recovery.
Self‑status monitoring.
We now examine each issue.
Plugin Performance
iLogtail’s core is fast, but its original Elasticsearch flusher plugin is a bottleneck.
Configuration Management
Missing a front‑end UI for administrators to manage collection configurations.
iLogtail agents lack lifecycle management; terminated agents leave stale heartbeat records in Config Server.
Thousands of application instances are grouped, requiring tag‑based grouping support in Config Server.
Disaster Recovery
All production nodes require multi‑instance deployment, including Config Server. The current LevelDB storage is stateful and must be replaced with a mature relational database such as MySQL.
Self‑Status Monitoring
Agents need to report CPU usage, memory consumption, and other metrics to Config Server for load monitoring.
Engineering Solutions (KR1‑KR5)
KR1: Solve Elasticsearch Flusher Performance Bottleneck
Use the esapi.BulkRequest interface to batch log data, reducing request count by two to three orders of magnitude.
Generate a random pack ID before flushing and use it as a routing parameter to send a batch to the same shard, avoiding unnecessary ID‑based sharding.
Enable a Go routine pool for concurrent log transmission.
KR2: Solve Agent Lifecycle Management and Liveness Detection
Config Server considers an agent online only after receiving a configurable number of consecutive heartbeats.
If no heartbeat is received within the same number of intervals, the agent is marked offline.
Agents offline for a defined duration are automatically cleaned up from the heartbeat store.
KR3: Solve Agent Tag‑Based Grouping
Unify the tag structures of AgentGroup and Agent to name‑value pairs and add logical operators (AND, OR) to AgentGroup. An agent belongs to a group if its tags satisfy the group’s expression.
message AgentGroupTag {
string name = 1;
string value = 2;
}
message AgentGroup {
string group_name = 1;
...
repeated AgentGroupTag tags = 3;
}
message Agent {
string agent_id = 1;
...
repeated string tags = 4;
...
}KR4: Solve Config Server Disaster Recovery
Implement the provided Database interface using GORM to support MySQL, PostgreSQL, SQLite, and SQL Server, replacing LevelDB with a reliable relational store.
type Database interface {
Connect() error
GetMode() string // store mode
Close() error
Get(table string, entityKey string) (interface{}, error)
Add(table string, entityKey string, entity interface{}) error
Update(table string, entityKey string, entity interface{}) error
Has(table string, entityKey string) (bool, error)
Delete(table string, entityKey string) error
GetAll(table string) ([]interface{}, error)
GetWithField(table string, pairs ...interface{}) ([]interface{}, error)
Count(table string) (int, error)
WriteBatch(batch *Batch) error
}All methods are implemented via GORM’s CRUD operations, enabling persistence to MySQL, PostgreSQL, SQLite, and SQL Server.
KR5: Self‑Status Monitoring
Agents report CPU usage and memory consumption in the heartbeat’s extras field. Config Server stores these metrics in MySQL, and a scheduled task snapshots them for time‑series visualization and troubleshooting.
Log Query Service
While Kibana provides generic visualization, it lacks enterprise‑grade features such as multi‑user role control, log library configuration, and multi‑cluster management. A minimal custom log query platform is required to complete the end‑to‑end solution, though it is outside the scope of iLogtail.
Log Collection Practice Summary
Since its launch in February 2024, iLogtail (based on open‑source 1.8.0) has run stably for over three months, ingesting billions of logs per hour at TB scale, serving thousands of pods, and utilizing hundreds of Elasticsearch nodes. Ongoing work includes protocol changes that will be upstreamed to the iLogtail main branch.
Among the five challenges, only the Elasticsearch flusher required code changes; the other four are tightly coupled with Config Server, which the community regards as a "crown jewel" due to its high user value.
References:
iLogtail Open‑Source Two‑Year Anniversary: Gratitude and Future Vision
iLogtail Evolution: Redefining Observability Collection Boundaries
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
