Mastering SPL vs DSL: Practical Data Cleaning and Transformation Techniques
This guide compares the new SPL syntax with the legacy DSL for common data processing tasks such as error log filtering, field management, time parsing, and extracting unstructured data, providing concrete command examples for each scenario.
Exact Matching
Filter error logs where the level field equals ERROR:
SPL: | where level='ERROR' Legacy DSL: e_keep(v("level") == "ERROR") or e_drop(v("level") != "ERROR") or e_if(v("level") != "ERROR", e_drop()) or
e_keep(e_search("level==ERROR"))Fuzzy Matching
Match log levels that contain the character E (e.g., ERROR, ERR, E).
SPL: | where level like '%E%' Legacy DSL: e_keep(op_in(v("level"), "E")), e_keep(e_search("level: E")), or
e_if(op_not_in(v("level"), "E"), e_drop())Numeric Range Filtering
Select HTTP status codes in the 4xx range while preserving the field type.
SPL:
| extend cast(status as bigint) as status
| where status >= 400 and status < 500Legacy DSL:
e_keep(ct_int(v("status")) >= 400 and ct_int(v("status")) < 500)Existence Checks
Retain records that contain an error field.
SPL: | where error is not null Legacy DSL:
e_keep(e_has("error"))Field Management
Creating or updating fields.
SPL uses extend:
| extend kb = 1024 # constant value
| extend size = size / 1024 # arithmetic
| extend version = regexp_extract(data, '"version":\d+')
| extend version = json_extract(data, '$.version')Legacy DSL equivalents use e_set and related functions, e.g., e_set("kb", 1024), e_set("size", ct_int(v("size"))/ct_int(v("kb"))), e_set("version", regex_select(v("data"), r'"version":\d+')), e_set("version", json_select(v("data"), "version")).
Field Selection, Renaming, and Exclusion
SPL:
| project node="__tag__:node", path # exact selection
| project -wildcard "__tag__:*" # wildcard selection
| project-rename node="__tag__:node" # rename
| project-away -wildcard "__tag__:*" # excludeLegacy DSL equivalents: e_keep_fields, e_rename, e_drop_fields with optional regex flags.
Conditional Expressions
SQL‑style conditional logic in SPL.
| extend valid = IF(type is not null, 'true', 'false')
| extend size = COALESCE(input, output, 0)
| extend size = CASE WHEN dir='I' THEN input WHEN dir='O' THEN output ELSE 0 ENDLegacy DSL equivalents:
e_set("valid", op_if(e_has("type"), "true", "false"))
e_set("size", op_coalesce(v("input"), v("output"), "0"))
e_if_else(
e_has("type"),
e_set("valid", "true"),
e_set("valid", "false")
)Time Parsing and Formatting
SPL keeps timestamps as integers ( __time__, __time_ns_part__) and provides date_parse and date_format for conversion.
| extend time = date_parse(time, '%Y/%m/%d %H-%i-%S')
| extend __time__ = cast(to_unixtime(time) as bigint)
| extend time = date_format(time, '%Y-%m-%d %H:%i:%S')Legacy DSL uses dt_parsetimestamp and dt_strftime:
e_set("__time__", dt_parsetimestamp(v("time"), fmt="%Y/%m/%d %H-%M-%S"))
e_set("time", dt_strftime(dt_parse(v("time"), fmt="%Y/%m/%d %H-%M-%S"), fmt="%Y-%m-%d %H:%M:%S"))Unstructured Data Extraction
Regular expression extraction:
SPL: | parse-regexp data, '(\S+)\s+(\w+)' as time, level Legacy DSL: e_regex("data", r"(\S+)\s+(\w+)", ["time", "level"]) JSON extraction (JsonPath):
SPL: | parse-json -path='$.x.y.z' data Legacy DSL (JMES): e_json("data", depth=1, jmes="x.y.z") CSV extraction (RFC 4180):
SPL: | parse-csv -delim='\0' -quote='"' data as time, addr, user or | parse-csv -delim='^_^' data as time, addr, user Legacy DSL:
e_csv("data", ["time", "addr", "user"], sep="\0", quote='"')Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
