How to Use SLS SPL for Efficient Data Filtering and Projection in Alibaba Cloud Flink
This guide explains how to leverage Alibaba Cloud Log Service (SLS) SPL within the Flink SLS Connector to push down row filtering and column projection, reducing network traffic and compute load while enabling real‑time log analytics using Flink SQL.
Background
Alibaba Cloud Log Service (SLS) provides large‑scale, low‑cost, real‑time storage and analysis for log, metric, and trace data. Alibaba Cloud Flink, built on Apache Flink, includes a native SLS connector that can read from or write to SLS.
SLS SPL Overview
SLS SPL is a high‑performance, weakly‑structured log processing language usable in Logtail, query, and streaming consumption. It follows a pipeline syntax:
<data-source> | <spl-cmd> -option=<option> ... <expression>, ... as <output>, ...Key SPL commands:
extend : create new fields via SQL expressions.
where : filter rows.
project , project-away , project-rename : field projection and renaming.
parse-regexp , parse-json , parse-csv : extract values from unstructured fields.
Push‑down Principle in the Flink SLS Connector
Without SPL the connector streams all rows and columns; Flink then performs filtering and projection, incurring unnecessary network and CPU usage. By setting the connector’s query parameter with an SPL statement, filtering and column selection are executed on the SLS side, so only the required data is sent to Flink.
Row‑filtering example
Filter logs where slbid = 'slb-01':
CREATE TEMPORARY TABLE sls_input (
request_uri STRING,
scheme STRING,
slbid STRING,
status STRING,
`__topic__` STRING METADATA VIRTUAL,
`__source__` STRING METADATA VIRTUAL,
`__timestamp__` STRING METADATA VIRTUAL,
__tag__ MAP<VARCHAR, VARCHAR> METADATA VIRTUAL,
proctime AS PROCTIME()
) WITH (
'connector' = 'sls',
'endpoint' = 'cn-beijing-intranet.log.aliyuncs.com',
'accessId' = '${ak}',
'accessKey' = '${sk}',
'starttime' = '2024-01-21 00:00:00',
'project' = '${project}',
'logstore' = 'test-nginx-log',
'query' = '* | where slbid = ''slb-01'''
);Running a continuous query such as
SELECT slbid, COUNT(1) AS slb_cnt FROM sls_input GROUP BY slbidprocesses only records with slb-01.
Column‑projection example
Add a project clause to return only needed fields:
...,
'query' = '* | where slbid = ''slb-01'' | project request_uri, scheme, slbid, status, __topic__, __source__, "__tag__:__receive_time__"'The connector now streams only the specified columns, further reducing network traffic and Flink compute.
Experimental Results
Enabling SPL dramatically lowers inbound traffic and CPU usage on Flink while SLS write traffic remains unchanged. Both row filtering and column projection are pushed down, eliminating unnecessary data movement.
Additional SPL Capabilities
SPL also supports complex transformations such as regex extraction ( parse-regexp), JSON parsing ( parse-json), CSV splitting ( parse-csv), field addition/removal, and data format conversion. These features can be used in SLS scan mode, ingestion agents, and other consumption scenarios.
References
SPL Instruction: https://help.aliyun.com/zh/sls/user-guide/spl-instruction
Log Service Overview: https://help.aliyun.com/zh/sls/product-overview/what-is-log-service
SPL Overview: https://help.aliyun.com/zh/sls/user-guide/spl-overview
Alibaba Cloud Flink SLS Connector: https://help.aliyun.com/zh/flink/developer-reference/log-service-connector
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
