Cloud Native 14 min read

Solving Edge Observability: How LoongCollector Ensures Reliable Data Collection

This article explains the three major challenges of collecting observability data on edge devices—unstable networks, reliable delivery, and bandwidth limits—and shows how LoongCollector’s persistent‑asynchronous architecture, smart back‑pressure, and configurable flow control provide a low‑resource, high‑reliability solution with real‑world performance results.

Alibaba Cloud Observability

Jan 26, 2026

Solving Edge Observability: How LoongCollector Ensures Reliable Data Collection

Background

With the rapid growth of cloud computing and IoT, many business scenarios push computation and data collection to the edge, such as smart‑manufacturing lines, in‑vehicle systems, retail terminals, and smart homes. These devices generate valuable logs, metrics, and traces that are essential for operation, fault diagnosis, and user‑experience optimization.

Three Major Challenges for Edge Data Collection

Challenge 1: Unstable Network Environment

Weak network: Mobile signal fluctuations, unstable Wi‑Fi, and high latency across regions cause low bandwidth and high packet loss.

Power supply not guaranteed: Many devices rely on batteries or may experience sudden power loss.

Severe resource constraints: Edge devices have limited CPU, memory, storage, and network bandwidth.

Challenge 2: Reliable Delivery of Observability Data

Data loss risk: Network interruptions, power outages, or process crashes can discard data.

Order guarantee: Time‑series data (metrics, traces) must preserve the collection order.

Challenge 3: Bandwidth Limitation

High traffic cost: 4G/5G data fees are far higher than data‑center dedicated lines.

Bandwidth competition: Collection traffic competes with business traffic for limited bandwidth.

Upload rate limits: Some networks impose strict upload caps.

LoongCollector Overview

LoongCollector is an open‑source, high‑performance, highly reliable observability data collector from Alibaba Cloud. It has been proven in Alibaba Cloud’s internal deployment of millions of instances and is specially optimized for edge scenarios.

Core Capabilities

Host monitoring: Real‑time collection of CPU, memory, disk, network and >100 system metrics.

Prometheus protocol: Full compatibility with the Prometheus ecosystem, supporting all Prometheus‑compatible applications.

Log collection: Efficient text‑log ingestion with multiple formats and parsers.

Ultra‑Low Resource Consumption

LoongCollector is heavily optimized for devices with scarce resources, allowing more collection tasks on the same hardware or stable operation on extremely constrained devices.

Solution Architecture: Persistence + Asynchronous Sending + Intelligent Retry

Data is first written to local files (persistence), then a dedicated sender thread reads the files and transmits data (asynchronous sending). This decouples collection from network state, ensuring no data loss during power cuts or crashes.

Local Persistence

All metric data is stored in files. A fine‑grained checkpoint records the read offset of each file, so after a crash or power loss the collector resumes from the exact point without data loss.

Asynchronous Consumption

The sender thread reads persisted files in order, guaranteeing that data is sent in the same chronological order it was collected. File rotation and sequence numbers ensure correct ordering across multiple files.

Smart Back‑Pressure and Flow Control

Queue back‑pressure: When the send queue reaches a threshold, file reading is paused to prevent memory explosion.

Traffic limiting: The max_bytes_per_sec parameter caps the outbound bandwidth, protecting business traffic.

Adaptive concurrency: Inspired by TCP congestion control, LoongCollector dynamically adjusts the number of concurrent senders based on network conditions, providing fast response, quick convergence, and automatic recovery.

Configuration Examples

A typical edge deployment includes a host‑monitor input and a Prometheus input, each flushed to a local file.

{
  "discard_old_data": false,
  "config_server_lost_connection_timeout": 604800,
  "force_quit_read_timeout": 604800,
  "max_bytes_per_sec": 1048576,
  "cpu_usage_limit": 0.4,
  "mem_usage_limit": 384,
  "working_ip": "192.168.0.1"
}

enable: true
inputs:
  - Type: input_host_monitor
    Interval: 15
flushers:
  - Type: flusher_file
    MaxFileSize: 104857600
    MaxFiles: 10
    FilePath: /usr/local/ilogtail/metrics/host.log

enable: true
inputs:
  - Type: input_prometheus
    ScrapeConfig:
      job_name: node
      host_only_mode: true
      scrape_interval: 15s
      scrape_timeout: 10s
      static_configs:
        - targets: ["localhost:12345"]
flushers:
  - Type: flusher_file
    MaxFileSize: 524288000
    MaxFiles: 10
    FilePath: /usr/local/ilogtail/metrics/metric.log

Performance Test Results

On a representative edge device, LoongCollector exhibits minimal resource usage while staying within the configured bandwidth limit.

CPU: average 0.02 core, peak 0.028 core.

Memory: average 31.5 MB, peak 35 MB.

Network (after compression): average 1.07 KB/s, peak 1.10 KB/s (raw data before back‑pressure was ~13 KB/s).

Network traffic before and after compression

Conclusion and Outlook

LoongCollector effectively tackles edge‑observability challenges by guaranteeing reliable data delivery, providing local persistence, decoupling collection from sending, and implementing intelligent back‑pressure and flow control. Nevertheless, further improvements are planned:

Simplify pipeline configuration by integrating persistence directly into a single pipeline.

Add support for Alibaba Cloud STS temporary credentials to avoid AccessKey leakage.

Explore more aggressive compression algorithms to further reduce traffic costs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance cloud-native Edge Computing observability data-collection

Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.