Shift Data Cleaning Server‑Side with SPL: Boost Real‑Time Log Processing
Alibaba Cloud Log Service’s new SPL‑based rule consumption lets users move complex data‑cleaning logic from client code to the server, offering low‑code configuration, high performance, precise filtering, and significant reductions in latency, bandwidth, and compute resources across typical scenarios such as Python SDK processing and Flink integration.
Background and Motivation
SLS (Log Service) has upgraded its real‑time consumption capability by introducing SPL‑based rule consumption. This feature enables users to configure simple SPL pipelines that perform data cleaning and preprocessing on the server side, effectively moving complex client‑side business logic to the backend.
Core Value of SPL Consumption
Low‑code and programmable: SPL offers a concise, pipeline‑style syntax that allows developers to implement data‑cleaning logic with minimal code.
High performance: By executing cleaning at the data source, SPL reduces latency while maximizing consumption and processing efficiency, saving client‑side compute resources.
Functional Advantages
Precise filtering: where and project directives enable row filtering and column pruning.
String handling: Regular‑expression support for pattern matching and extraction.
JSON parsing: json function and parse‑json directive handle structured log data.
Rich SQL function set: Includes string, datetime, JSON, regex, conditional, and type‑conversion functions.
Complex data parsing: Array, struct, MAP, and Lambda expressions allow deep operations on nested data.
Typical Scenarios
Case 1 – Reducing Latency and Code Size
Customer A stored massive application logs in SLS and used a Function Compute (FC) Python SDK to process them. Processing 10 MB logs took 15 seconds, far exceeding real‑time requirements. By replacing ~200 lines of Python cleaning code with an SPL rule (≈50 lines), the processing time dropped to under 100 ms, dramatically improving latency and reducing client‑side complexity.
Case 2 – Bandwidth Savings via Push‑Down Filtering
Customer B used Flink with the SLS connector to analyze audit logs across regions. Pulling the full log volume caused high latency and excessive public‑network bandwidth. By configuring a simple SPL filter in the connector, only the required 10 % of logs were transferred, cutting bandwidth usage by 90 % and accelerating analysis.
Ecosystem Integration
SLS rule consumption already integrates with Alibaba Cloud Flink, DataWorks, Splunk, Function Compute, and supports multiple SDKs (Java, Python, Go). The SPL processor can be referenced by processorId to simplify long statements and promote code reuse.
Future Outlook
Extended SPL Processor: Longer statements and stored processors for reusable logic.
Performance enhancements: Ongoing optimization to reduce filtering latency by an additional 50 %.
References
Alibaba Cloud Flink SLS connector (supports SPL) – https://help.aliyun.com/zh/flink/developer-reference/log-service-connector
Flink SQL SPL row filter – https://sls.aliyun.com/doc/spldataprocessdemo/flink_spl_filter.html
Flink SQL SPL column cut – https://sls.aliyun.com/doc/spldataprocessdemo/flink_spl_cut.html
Flink SQL SPL weak‑structured analysis – https://sls.aliyun.com/doc/spldataprocessdemo/flink_spl_structured_analysis.html
DataWorks SLS source (supports SPL) – https://help.aliyun.com/zh/dataworks/user-guide/loghub-data-source
Splunk HEC log delivery (supports SPL) – https://help.aliyun.com/zh/sls/user-guide/ship-logs-to-a-siem-system-over-https
Java SDK SPL consumption – https://sls.aliyun.com/doc/spldataprocessdemo/java_sdk_sql_consumer.html
Java consumer‑group SPL consumption – https://sls.aliyun.com/doc/spldataprocessdemo/java_consumer_group_sql_consumer.html
Go SDK SPL consumption – https://sls.aliyun.com/doc/spldataprocessdemo/go_sdk_sql_consumer.html
Go consumer‑group SPL consumption – https://sls.aliyun.com/doc/spldataprocessdemo/go_consumer_group_sql_consumer.html
Python SDK SPL consumption – https://sls.aliyun.com/doc/spldataprocessdemo/python_sdk_spl_consumer.html
Python consumer‑group SPL consumption – https://sls.aliyun.com/doc/spldataprocessdemo/python_consumer_group_sql_consumer.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
