Cloud Native 13 min read

How to Use SLS SPL for Efficient Data Filtering and Projection in Alibaba Cloud Flink

This guide explains how to leverage Alibaba Cloud Log Service (SLS) SPL within the Flink SLS Connector to push down row filtering and column projection, reducing network traffic and compute load while enabling real‑time log analytics using Flink SQL.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How to Use SLS SPL for Efficient Data Filtering and Projection in Alibaba Cloud Flink

Background

Alibaba Cloud Log Service (SLS) provides large‑scale, low‑cost, real‑time storage and analysis for log, metric, and trace data. Alibaba Cloud Flink, built on Apache Flink, includes a native SLS connector that can read from or write to SLS.

SLS SPL Overview

SLS SPL is a high‑performance, weakly‑structured log processing language usable in Logtail, query, and streaming consumption. It follows a pipeline syntax:

<data-source> | <spl-cmd> -option=<option> ... <expression>, ... as <output>, ...

Key SPL commands:

extend : create new fields via SQL expressions.

where : filter rows.

project , project-away , project-rename : field projection and renaming.

parse-regexp , parse-json , parse-csv : extract values from unstructured fields.

Push‑down Principle in the Flink SLS Connector

Without SPL the connector streams all rows and columns; Flink then performs filtering and projection, incurring unnecessary network and CPU usage. By setting the connector’s query parameter with an SPL statement, filtering and column selection are executed on the SLS side, so only the required data is sent to Flink.

Row‑filtering example

Filter logs where slbid = 'slb-01':

CREATE TEMPORARY TABLE sls_input (
  request_uri STRING,
  scheme STRING,
  slbid STRING,
  status STRING,
  `__topic__` STRING METADATA VIRTUAL,
  `__source__` STRING METADATA VIRTUAL,
  `__timestamp__` STRING METADATA VIRTUAL,
  __tag__ MAP<VARCHAR, VARCHAR> METADATA VIRTUAL,
  proctime AS PROCTIME()
) WITH (
  'connector' = 'sls',
  'endpoint' = 'cn-beijing-intranet.log.aliyuncs.com',
  'accessId' = '${ak}',
  'accessKey' = '${sk}',
  'starttime' = '2024-01-21 00:00:00',
  'project' = '${project}',
  'logstore' = 'test-nginx-log',
  'query' = '* | where slbid = ''slb-01'''
);

Running a continuous query such as

SELECT slbid, COUNT(1) AS slb_cnt FROM sls_input GROUP BY slbid

processes only records with slb-01.

Column‑projection example

Add a project clause to return only needed fields:

...,
'query' = '* | where slbid = ''slb-01'' | project request_uri, scheme, slbid, status, __topic__, __source__, "__tag__:__receive_time__"'

The connector now streams only the specified columns, further reducing network traffic and Flink compute.

Experimental Results

Enabling SPL dramatically lowers inbound traffic and CPU usage on Flink while SLS write traffic remains unchanged. Both row filtering and column projection are pushed down, eliminating unnecessary data movement.

Additional SPL Capabilities

SPL also supports complex transformations such as regex extraction ( parse-regexp), JSON parsing ( parse-json), CSV splitting ( parse-csv), field addition/removal, and data format conversion. These features can be used in SLS scan mode, ingestion agents, and other consumption scenarios.

References

SPL Instruction: https://help.aliyun.com/zh/sls/user-guide/spl-instruction

Log Service Overview: https://help.aliyun.com/zh/sls/product-overview/what-is-log-service

SPL Overview: https://help.aliyun.com/zh/sls/user-guide/spl-overview

Alibaba Cloud Flink SLS Connector: https://help.aliyun.com/zh/flink/developer-reference/log-service-connector

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

FlinkSLSData Filteringstreaming SQLSPL
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.