Operations 10 min read

Mastering SPL vs DSL: Practical Data Cleaning and Transformation Techniques

This guide compares the new SPL syntax with the legacy DSL for common data processing tasks such as error log filtering, field management, time parsing, and extracting unstructured data, providing concrete command examples for each scenario.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Mastering SPL vs DSL: Practical Data Cleaning and Transformation Techniques

Exact Matching

Filter error logs where the level field equals ERROR:

SPL: | where level='ERROR' Legacy DSL: e_keep(v("level") == "ERROR") or e_drop(v("level") != "ERROR") or e_if(v("level") != "ERROR", e_drop()) or

e_keep(e_search("level==ERROR"))

Fuzzy Matching

Match log levels that contain the character E (e.g., ERROR, ERR, E).

SPL: | where level like '%E%' Legacy DSL: e_keep(op_in(v("level"), "E")), e_keep(e_search("level: E")), or

e_if(op_not_in(v("level"), "E"), e_drop())

Numeric Range Filtering

Select HTTP status codes in the 4xx range while preserving the field type.

SPL:

| extend cast(status as bigint) as status
| where status >= 400 and status < 500

Legacy DSL:

e_keep(ct_int(v("status")) >= 400 and ct_int(v("status")) < 500)

Existence Checks

Retain records that contain an error field.

SPL: | where error is not null Legacy DSL:

e_keep(e_has("error"))

Field Management

Creating or updating fields.

SPL uses extend:

| extend kb = 1024                     # constant value
| extend size = size / 1024           # arithmetic
| extend version = regexp_extract(data, '"version":\d+')
| extend version = json_extract(data, '$.version')

Legacy DSL equivalents use e_set and related functions, e.g., e_set("kb", 1024), e_set("size", ct_int(v("size"))/ct_int(v("kb"))), e_set("version", regex_select(v("data"), r'"version":\d+')), e_set("version", json_select(v("data"), "version")).

Field Selection, Renaming, and Exclusion

SPL:

| project node="__tag__:node", path                     # exact selection
| project -wildcard "__tag__:*"                         # wildcard selection
| project-rename node="__tag__:node"                   # rename
| project-away -wildcard "__tag__:*"                    # exclude

Legacy DSL equivalents: e_keep_fields, e_rename, e_drop_fields with optional regex flags.

Conditional Expressions

SQL‑style conditional logic in SPL.

| extend valid = IF(type is not null, 'true', 'false')
| extend size = COALESCE(input, output, 0)
| extend size = CASE WHEN dir='I' THEN input WHEN dir='O' THEN output ELSE 0 END

Legacy DSL equivalents:

e_set("valid", op_if(e_has("type"), "true", "false"))
 e_set("size", op_coalesce(v("input"), v("output"), "0"))
 e_if_else(
   e_has("type"),
   e_set("valid", "true"),
   e_set("valid", "false")
 )

Time Parsing and Formatting

SPL keeps timestamps as integers ( __time__, __time_ns_part__) and provides date_parse and date_format for conversion.

| extend time = date_parse(time, '%Y/%m/%d %H-%i-%S')
| extend __time__ = cast(to_unixtime(time) as bigint)
| extend time = date_format(time, '%Y-%m-%d %H:%i:%S')

Legacy DSL uses dt_parsetimestamp and dt_strftime:

e_set("__time__", dt_parsetimestamp(v("time"), fmt="%Y/%m/%d %H-%M-%S"))
 e_set("time", dt_strftime(dt_parse(v("time"), fmt="%Y/%m/%d %H-%M-%S"), fmt="%Y-%m-%d %H:%M:%S"))

Unstructured Data Extraction

Regular expression extraction:

SPL: | parse-regexp data, '(\S+)\s+(\w+)' as time, level Legacy DSL: e_regex("data", r"(\S+)\s+(\w+)", ["time", "level"]) JSON extraction (JsonPath):

SPL: | parse-json -path='$.x.y.z' data Legacy DSL (JMES): e_json("data", depth=1, jmes="x.y.z") CSV extraction (RFC 4180):

SPL: | parse-csv -delim='\0' -quote='"' data as time, addr, user or | parse-csv -delim='^_^' data as time, addr, user Legacy DSL:

e_csv("data", ["time", "addr", "user"], sep="\0", quote='"')
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DSLdata cleaningLog ProcessingSPLtime parsingfield managementregex extraction
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.