Mastering Log Data Cleaning with SPL: From Error Filtering to Time Parsing
This article compares the new SPL syntax with the legacy DSL for SLS log processing, demonstrating how to filter errors, perform fuzzy matches, handle numeric ranges, manage fields, apply conditional expressions, parse timestamps, and extract data from unstructured sources using concise examples.
Overview Following the previous article on SLS data processing upgrades, this piece continues the comparison between the new SPL syntax and the old DSL for various data handling scenarios. For simple data synchronization, both can use empty logic.
Scenario 1: Data Filtering and Cleaning
In daily operations, error log analysis is crucial. The old DSL used e_keep / e_drop functions, while SPL uses the where clause.
Exact match (filter ERROR level)
New SPL: | where level='ERROR' Old DSL alternatives:
e_keep(v("level") == "ERROR") e_drop(v("level") != "ERROR") e_if(v("level") != "ERROR", e_drop()) e_keep(e_search("level==ERROR"))Fuzzy match (level may be ERROR, ERR, or E)
New SPL: | where level like '%E%' Old DSL alternatives:
e_keep(op_in(v("level"), "E")) e_keep(e_search("level: E")) e_if(op_not_in(v("level"), "E"), e_drop())Numeric range (status code 4xx)
New SPL: | extend cast(status as bigint) as status then | where status>=400 and status<500 Old DSL: e_keep(ct_int(v("status"))>=400 and ct_int(v("status"))<500) Existence check (field error)
New SPL: | where error is not null Old DSL:
e_keep(e_has("error"))Scenario 2: Field Management
SPL’s extend command replaces the DSL’s e_set for constructing or modifying fields.
New field construction
Set constant: | extend kb=1024 Derived value: | extend size=size/1024 Regex extraction: | extend version=regexp_extract(data, '"version":\d+') JSON extraction: | extend version=json_extract(data, '$.version') Old DSL equivalents
Constant: e_set("kb", 1024) Derived: e_set("size", ct_int(v("size"))/ct_int(v("kb"))) Regex: e_set("version", regex_select(v("data"), r'"version":\d+')) JSON: e_set("version", json_select(v("data"), "version")) Field selection, renaming, and exclusion
Exact selection: | project node="__tag__:node", path vs e_keep_fields("__tag__:node", "path", regex=False) Wildcard selection: | project -wildcard "__tag__:*" vs e_keep_fields("__tag__:.*", regex=True) Rename: | project-rename node="__tag__:node" vs e_rename("__tag__:node", node) Exclude: | project-away -wildcard "__tag__:*" vs
e_drop_fields("__tag__:.*", regex=True)Scenario 3: Time Information Parsing and Formatting
Log timestamps are stored as INTEGER or BIGINT fields __time__ and __time_ns_part__. SPL uses extend with date_parse, to_unixtime, and date_format to manipulate them.
Extract and cast: | extend time=date_parse(time, '%Y/%m/%d %H-%i-%S') then | extend __time__=cast(to_unixtime(time) as bigint) Normalize format: | extend time=date_format(time, '%Y-%m-%d %H:%i:%S') Legacy DSL equivalents use dt_parsetimestamp and dt_strftime functions.
Scenario 4: Unstructured or Semi‑structured Data Extraction
SPL provides dedicated commands for extracting information from free‑form logs.
Regex extraction : | parse-regexp data, '(\S+)\s+(\w+)' as time, level vs e_regex("data", r"(\S+)\s+(\w+)", ["time","level"]) JSON extraction : | parse-json -path='$.x.y.z' data vs e_json("data", depth=1, jmes="x.y.z") CSV extraction : | parse-csv -delim='\0' -quote='"' data as time, addr, user (single‑character delimiter) and | parse-csv -delim='^_^' data as time, addr, user (multi‑character delimiter) vs e_csv("data", ["time","addr","user"], sep='\0', quote='"') These examples illustrate how SPL simplifies log data processing compared with the older DSL, offering clearer syntax and direct SQL‑like capabilities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
