Cloud Native 11 min read

Shift Data Cleaning Server‑Side with SPL: Boost Real‑Time Log Processing

Alibaba Cloud Log Service’s new SPL‑based rule consumption lets users move complex data‑cleaning logic from client code to the server, offering low‑code configuration, high performance, precise filtering, and significant reductions in latency, bandwidth, and compute resources across typical scenarios such as Python SDK processing and Flink integration.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Shift Data Cleaning Server‑Side with SPL: Boost Real‑Time Log Processing

Background and Motivation

SLS (Log Service) has upgraded its real‑time consumption capability by introducing SPL‑based rule consumption. This feature enables users to configure simple SPL pipelines that perform data cleaning and preprocessing on the server side, effectively moving complex client‑side business logic to the backend.

Core Value of SPL Consumption

Low‑code and programmable: SPL offers a concise, pipeline‑style syntax that allows developers to implement data‑cleaning logic with minimal code.

High performance: By executing cleaning at the data source, SPL reduces latency while maximizing consumption and processing efficiency, saving client‑side compute resources.

Functional Advantages

Precise filtering: where and project directives enable row filtering and column pruning.

String handling: Regular‑expression support for pattern matching and extraction.

JSON parsing: json function and parse‑json directive handle structured log data.

Rich SQL function set: Includes string, datetime, JSON, regex, conditional, and type‑conversion functions.

Complex data parsing: Array, struct, MAP, and Lambda expressions allow deep operations on nested data.

Typical Scenarios

Case 1 – Reducing Latency and Code Size

Customer A stored massive application logs in SLS and used a Function Compute (FC) Python SDK to process them. Processing 10 MB logs took 15 seconds, far exceeding real‑time requirements. By replacing ~200 lines of Python cleaning code with an SPL rule (≈50 lines), the processing time dropped to under 100 ms, dramatically improving latency and reducing client‑side complexity.

Case 2 – Bandwidth Savings via Push‑Down Filtering

Customer B used Flink with the SLS connector to analyze audit logs across regions. Pulling the full log volume caused high latency and excessive public‑network bandwidth. By configuring a simple SPL filter in the connector, only the required 10 % of logs were transferred, cutting bandwidth usage by 90 % and accelerating analysis.

Ecosystem Integration

SLS rule consumption already integrates with Alibaba Cloud Flink, DataWorks, Splunk, Function Compute, and supports multiple SDKs (Java, Python, Go). The SPL processor can be referenced by processorId to simplify long statements and promote code reuse.

Future Outlook

Extended SPL Processor: Longer statements and stored processors for reusable logic.

Performance enhancements: Ongoing optimization to reduce filtering latency by an additional 50 %.

References

Alibaba Cloud Flink SLS connector (supports SPL) – https://help.aliyun.com/zh/flink/developer-reference/log-service-connector

Flink SQL SPL row filter – https://sls.aliyun.com/doc/spldataprocessdemo/flink_spl_filter.html

Flink SQL SPL column cut – https://sls.aliyun.com/doc/spldataprocessdemo/flink_spl_cut.html

Flink SQL SPL weak‑structured analysis – https://sls.aliyun.com/doc/spldataprocessdemo/flink_spl_structured_analysis.html

DataWorks SLS source (supports SPL) – https://help.aliyun.com/zh/dataworks/user-guide/loghub-data-source

Splunk HEC log delivery (supports SPL) – https://help.aliyun.com/zh/sls/user-guide/ship-logs-to-a-siem-system-over-https

Java SDK SPL consumption – https://sls.aliyun.com/doc/spldataprocessdemo/java_sdk_sql_consumer.html

Java consumer‑group SPL consumption – https://sls.aliyun.com/doc/spldataprocessdemo/java_consumer_group_sql_consumer.html

Go SDK SPL consumption – https://sls.aliyun.com/doc/spldataprocessdemo/go_sdk_sql_consumer.html

Go consumer‑group SPL consumption – https://sls.aliyun.com/doc/spldataprocessdemo/go_consumer_group_sql_consumer.html

Python SDK SPL consumption – https://sls.aliyun.com/doc/spldataprocessdemo/python_sdk_spl_consumer.html

Python consumer‑group SPL consumption – https://sls.aliyun.com/doc/spldataprocessdemo/python_consumer_group_sql_consumer.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceReal-time Processinglow-codedata cleaningLog ServiceSPL
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.