How Alibaba’s iLogtail Revolutionizes Observability at Massive Scale
This article details the open‑source release of Alibaba’s iLogtail, explains the observability challenges it solves, traces its evolution through the Feitian 5K, Alibaba Group, and cloud‑native stages, highlights its high‑performance, stability, and multi‑tenant features, and outlines the community‑driven open‑source goals alongside a brief Impala overview.
On November 23, Alibaba officially open‑sourced iLogtail, the foundational observability data collector used across Alibaba Group and Ant Group for logs, metrics, traces, and events. iLogtail runs on servers, containers, Kubernetes, and embedded environments, handling hundreds of data types with millions of installations and tens of petabytes collected daily.
iLogtail and Observability
Observability has evolved from traditional monitoring and troubleshooting to a comprehensive approach that gathers as many data types as possible for white‑box insight. iLogtail’s core role is to collect diverse observability data to enable higher‑level platform applications.
Challenges of Alibaba’s Observability Data Collection
Existing open‑source agents (Logstash, Filebeat, Fluentd, Collectd, Telegraf) are feature‑rich but fall short on performance, stability, and manageability at Alibaba’s scale.
Resource consumption: Tens of PB per day, millions of hosts; need 100 M/s per core and minimal memory.
Stability: Agents must not impact business workloads; require self‑recovery, multi‑dimensional monitoring, issue isolation, and reliable rollback.
Manageability: Hundreds of data types per machine, multi‑department usage, remote configuration, priority handling, and comprehensive data‑completeness guarantees.
Since 2013, iLogtail has continuously optimized performance, stability, and control, surviving multiple Alibaba Double‑11, Double‑12, and Spring Festival events.
iLogtail Development Timeline
1. Feitian 5K Stage
In 2013, Alibaba’s "Feitian" 5K cloud cluster required unified log and metric collection. iLogtail was created to address monitoring, troubleshooting, and traceability for 5,000 machines, delivering millisecond‑level log tailing and low resource usage.
2. Alibaba Group Stage
Expanding to the entire Alibaba and Ant groups introduced million‑scale hosts, higher stability demands, and multi‑tenant isolation. iLogtail added support for various log formats, JSON, filtering, and achieved up to 100 M/s per core in minimal mode.
3. Cloud‑Native Stage
With Alibaba Cloud’s full cloudification, iLogtail embraced cloud‑native environments, supporting containers, Kubernetes, and offering an Operator‑based configuration model. A plugin system enables custom Input, Processor, Aggregator, and Flusher extensions, and the agent runs on Windows, Linux, x86, ARM, servers, and embedded devices.
Open‑Source Background and Expectations
Closed‑source solutions cannot keep pace with cloud‑native evolution; open‑sourcing iLogtail aims to foster community collaboration, improve resource efficiency, and broaden adoption beyond Alibaba’s internal use. The project seeks contributions to enhance performance, stability, and ecosystem integration.
Impala Tutorial (Brief)
Impala is an open‑source MPP SQL engine for massive data stored in Hadoop, offering high performance and low latency by leveraging HDFS, HBase, Metastore, YARN, and Sentry.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
