Building a Cost‑Effective, High‑Availability Global Log Collection with Alibaba Cloud SLS
This article explains how enterprises can overcome the challenges of globally collecting logs by using Alibaba Cloud's high‑performance iLogtail/LoongCollector agents, selecting optimal network paths, applying cost‑saving strategies such as CloudLens diagnostics and LZ4 compression, and configuring multi‑region dual‑write for reliable, low‑cost log delivery.
Abstract
As enterprises expand globally, unifying the collection of application and infrastructure logs from overseas data centers to Alibaba Cloud Log Service (SLS) becomes a critical challenge. This article focuses on the high‑performance log collection agents iLogtail and LoongCollector, discusses optimal network access paths, cost‑optimization tactics, and multi‑region log distribution strategies.
1. Background and Challenges
Link quality and stability : Overseas public links often suffer from high latency, jitter, and packet loss, affecting real‑time log transmission.
Cost control : Public outbound traffic incurs significant expenses, especially in multi‑cloud or multi‑region deployments.
High availability and disaster recovery : Single‑point failures in the collection chain can cause data loss; multi‑target sending is needed for cross‑region resilience.
Complex environments and compliance : Logs originate from on‑premises data centers, other clouds, and different Alibaba Cloud regions, requiring flexible deployment and adherence to local data‑security regulations.
2. Core Collection Tools: iLogtail / LoongCollector
Lightweight and efficient : C++ core, low CPU and memory usage.
Universal collection : Supports file logs, container logs, Syslog, HTTP, etc.
Powerful processing (Processor plugins) : Parse, filter, and mask data on the agent side to reduce unnecessary transmission.
Compression : Built‑in LZ4 compression reduces network bandwidth.
Reliable transmission : Local disk cache, retry, traffic shaping, and network‑exception isolation.
Flexible output and multi‑target : One agent can send the same data to multiple SLS endpoints for dual‑write scenarios.
Cloud‑native integration : Deep integration with ECS, ACK/ASK, and Kubernetes (DaemonSet, Sidecar, CRD).
3. Why Prefer LoongCollector
LoongCollector offers higher reliability than iLogtail, especially the ability to isolate network exceptions on the sending side. When one target region (e.g., Singapore) experiences a network failure, LoongCollector automatically isolates that link while continuing to send to other healthy regions (e.g., Hangzhou), preventing a single failure from blocking the entire pipeline.
4. Network Design Options
Option 1: Direct Public Endpoint
Architecture : Overseas server → Public Internet → SLS public endpoint.
Pros : Simple configuration, no extra cloud network products.
Cons : Public traffic cost, variable network quality, security depends on HTTPS.
Option 2: Global Acceleration (GA)
Architecture : Overseas server → Public Internet → Nearest Alibaba Cloud PoP (GA) → Alibaba backbone → SLS endpoint.
Pros : Improves latency and packet loss, easy to enable by switching the data endpoint to the GA domain.
Cons : Higher traffic cost, but may reduce overall cost by lowering retransmissions.
Option 3: Same‑Region Private Network (Best Cost & Performance)
Architecture : Overseas Alibaba Cloud ECS → VPC private network → SLS private endpoint (same region).
Pros : Lowest latency, highest stability, highest security (no public exposure), lowest cost (no public traffic fees).
Cons : Both the server and the target SLS project must reside in the same Alibaba Cloud region.
Option 4: Hybrid/Multi‑Cloud via Dedicated Line / CEN / VPN
Architecture : Overseas server (IDC or other cloud) → Physical dedicated line / VPN / CEN → Alibaba Cloud VPC → SLS private endpoint.
Pros : Private network isolation, superior quality and stability compared with public or GA, suitable for large‑scale, high‑reliability requirements.
Cons : Highest cost (line rental, CEN instance, bandwidth), complex deployment, longer rollout time.
5. Cost Optimization Strategies
5.1 Use CloudLens for SLS Diagnosis
CloudLens provides a centralized view of all SLS projects, showing log write volume, public outbound traffic, and global‑acceleration traffic. By analyzing these reports you can quickly locate abnormal consumption, identify noisy services, and adjust collection configurations.
5.2 Data Compression with LZ4
Both iLogtail and LoongCollector support client‑side LZ4 compression. The algorithm offers high compression speed with low CPU overhead, typically achieving 5‑10× reduction in bandwidth, which is ideal for real‑time, high‑throughput log streams.
5.3 Log Filtering with Processor Plugins
Processor plugins allow on‑agent parsing, filtering, and masking. You can drop low‑value logs (e.g., DEBUG, health‑check) before they leave the server, reducing both network and storage costs. Filters can be implemented via native plugins or custom SPL scripts.
5.4 Smooth Migration from Public to Private Network
When moving from a public endpoint to a private VPC endpoint, configure the agent for dual‑write (both public and private endpoints) during the transition. Once the private path is stable, disable the public endpoint to ensure uninterrupted monitoring.
6. Multi‑Region Log Distribution
Agents can be configured with multiple endpoints, enabling a single server or Kubernetes node to send different log types to different regional SLS projects. Example: system logs from Singapore servers are sent to a Shanghai project for centralized ops monitoring, while application logs are sent to a Singapore project for business analysis.
7. Implementation Example (ECS / ACK)
{
"config_server_address": "http://logtail.ap-southeast-1-intranet.log.aliyuncs.com",
"config_server_address_list": ["http://cn-shanghai.log.aliyuncs.com"],
"data_server_list": [
{"cluster": "ap-southeast-1", "endpoint": "ap-southeast-1-intranet.log.aliyuncs.com"},
{"cluster": "cn-shanghai", "endpoint": "cn-shanghai.log.aliyuncs.com"}
]
}The same JSON can be used for both iLogtail and LoongCollector; LoongCollector adds the fields primary_region, config_servers, and data_servers with region‑specific endpoint lists.
8. Verification and Monitoring
Check machine‑group heartbeats in both Project A and Project B to ensure agents are alive.
Verify data arrival in the corresponding Logstores of each region.
Use CloudLens for overall resource consumption and set SLS alarms for write success rate, latency, and agent health (CPU, memory, error logs, link quality).
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
