Cloud Native 12 min read

How Observability 2.0 Redefines Cloud‑Native Log Pipelines and Cuts Costs by 66%

Observability 2.0 unifies logs, metrics and traces into a single platform, introduces event‑centric Wide Events, and drives a complete redesign of Alibaba Cloud's SLS data pipeline that delivers higher performance, lower latency, richer low‑code SPL processing, and up to a 66.7% reduction in processing costs.

Alibaba Cloud Native

May 20, 2025

How Observability 2.0 Redefines Cloud‑Native Log Pipelines and Cuts Costs by 66%

What is Observability 2.0?

Observability 2.0 (o11y 2.0) has become a hot topic in the DevOps community. It aims to break the silos of logs, metrics and traces by providing a unified view of system health, accelerating engineers' understanding of system behavior, and applying AI for anomaly detection and fault localization.

Key Goals of o11y 2.0

Unify Log/Metric/Trace under a single platform to build a complete health view.

Enable faster system behavior comprehension and issue diagnosis, reducing downtime.

Use logs as the core data source, enrich them with additional dimensions (Wide Events) to reconstruct facts.

Leverage cloud‑native architecture for elastic scaling, ease of use and low cost when processing massive event streams.

Evolution of the SLS Data Pipeline

SLS, a core product of Alibaba Cloud’s observability family, introduced data processing in 2019 and began a major pipeline upgrade in 2024. The upgraded pipeline offers three service shapes:

These shapes address different scenarios, fault‑tolerance requirements, cost considerations and ecosystem integration.

Technical Requirements for an Observability 2.0 System

Storage system : Must handle massive, heterogeneous real‑time streams with low latency, high write throughput, and fast high‑dimensional queries.

Compute system : Must perform billion‑scale data statistics within seconds, support elastic real‑time computation, and dynamically join multi‑source data.

The data pipeline is the core component that transforms raw logs into high‑quality Wide Events, analogous to refining crude oil into usable fuels.

Performance Improvements

The new pipeline, powered by the SPL engine (columnar computation, SIMD acceleration, C++ implementation), achieves:

Significant CPU‑throughput gains, keeping processing latency under one second even during burst traffic.

Horizontal scalability up to 1 PB/day of raw log data in simple filtering scenarios.

Benchmarks show a reduction of end‑to‑end function latency from 10‑15 seconds to 300 ms for a 10 MB high‑filtering workload.

Cost Reductions

Because SPL delivers higher performance, SLS lowered the price of the new processing service to one‑third of the legacy Python‑DSL engine, a 66.7 % price cut. Total cost of ownership also improves by reducing storage‑shard fees and operational overhead for scaling and managing complex ETL jobs.

Integration Efficiency with Third‑Party Services

SLS now provides native consumption adapters for Flink, Spark, Flume, DataWorks, OSS, and more. Using SPL low‑code expressions, developers can replace custom Python code for filtering and field extraction, dramatically reducing development time and CPU cost.

Example: a Function Compute (FC) program that reads logs from SLS, filters error messages, extracts structured fields and writes to a database can be rewritten as an SPL consumption processor, cutting execution time from seconds to milliseconds.

* | where status != '200'
* | project time_local, request_uri, status, user_agent, client_ip

Network‑Cost Optimisation

Cross‑region bandwidth dominates pipeline cost. Strategies include:

Enable data compression (e.g., ZSTD) on transfer links.

Transmit only required columns (column projection) or rows (row filtering) using SPL, dramatically reducing traffic.

Typical SPL snippets:

* | project time_local, request_uri, status, user_agent, client_ip

* | where status != '200'

Conclusion

Observability 2.0, powered by the SPL‑based pipeline, offers a schema‑free, high‑performance, and cost‑effective foundation for modern cloud‑native observability workloads. Ongoing enhancements will further strengthen its capabilities for AI‑driven large‑model tooling and real‑time data processing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance data pipeline Observability cost optimization SPL

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

What is Observability 2.0?

Key Goals of o11y 2.0

Evolution of the SLS Data Pipeline

Technical Requirements for an Observability 2.0 System

Performance Improvements

Cost Reductions

Integration Efficiency with Third‑Party Services

Network‑Cost Optimisation

Conclusion

Alibaba Cloud Native

How this landed with the community

Was this worth your time?

0 Comments

What is Observability 2.0?

Key Goals of o11y 2.0

Technical Requirements for an Observability 2.0 System