How SPL Transforms Log Processing in iLogtail 2.0: From Pipelines to Unified Stream Language
This article traces the evolution of stream‑processing languages, compares iLogtail's original pipeline model with the new SPL syntax, and provides a step‑by‑step practical example showing how SPL simplifies log parsing, improves performance, and unifies configuration across Alibaba Cloud services.
Evolution of Stream‑Processing Languages
Early concepts: In the 1970s languages like APL offered array‑oriented streaming operations; UNIX introduced pipes to chain command output to input.
Java Stream API: Java 8 (2014) added a Stream API with chainable calls, lazy evaluation, and internal iteration, tightly integrating streaming with the language.
Distributed frameworks: Apache Storm, Samza, Flink, and Beam provide APIs for distributed streaming, supporting event‑time semantics and windowed calculations.
Stream‑Batch unification: Apache Beam proposes a unified model that lets the same code handle batch and streaming workloads across runtimes such as Flink and Google Cloud Dataflow.
Streaming SQL: Projects like Flink SQL and KSQL enable complex stream logic using SQL‑like syntax.
Log and time‑series data are typical semi‑structured sources; languages such as KQL (Microsoft) and Splunk Processing Language offer intuitive search, powerful processing, flexible analysis, real‑time and historical handling, extensibility, and ease of use.
iLogtail Pipeline vs. SPL
Before SPL, iLogtail used a Pipeline model with two main modes:
Native Plugin Mode
Implemented in C++, this mode offers the highest performance but supports only fixed formats (regex, JSON, delimiter).
Extended Plugin Mode
Log lines are split in C++ then passed to Golang plugins for flexible processing; this adds serialization overhead and reduces performance compared to native C++.
Both modes require separate pipelines for different log formats, making configuration cumbersome.
Introducing SPL
iLogtail 2.0 adds SPL (SLS Processing Language) as a parallel execution path. SPL is built on the SLS SPL library, written in C++ for near‑native performance while providing a unified, expressive syntax.
Key advantages:
Unified syntax across iLogtail and real‑time consumption.
C++‑level performance comparable to native plugins.
Full alignment with SLS SQL functions.
Interactive debugging with real‑time preview in the console.
SPL Syntax Overview
Core Structure
Commands are chained with the pipe symbol |. A typical SPL statement looks like:
* | parse-regexp content, '\[([^]]+)]\s+([^}]+})\s+(.*)' as time,json,stack | parse-json json | project-away garbage,json,contentSQL‑style commands
extend– create new fields using SQL expressions. where – filter rows based on expressions.
* | extend latency=cast(latency as BIGINT) | where status='200' AND latency>100Field manipulation
project– keep or rename matching fields. project-away – drop matching fields. project-rename – rename fields while keeping others.
* | project-away -wildcard "__tag__:*" | project-rename __source__=remote_addrUnstructured data extraction
parse-regexp– extract groups via regular expressions. parse-json – parse top‑level JSON. parse-csv – parse CSV‑formatted fields.
* | parse-regexp content, '\[([^]]+)]\s+([^}]+})\s+(.*)' as time,json,stackPractical Example
Given a log line mixing JSON and a Java stack trace, the SPL configuration proceeds in three steps:
Extract time, json, and stack using parse-regexp.
Parse the json field with parse-json to obtain level, message, and garbage.
Remove unwanted fields ( garbage, json, content) with project-away.
The final SPL pipeline is:
* | parse-regexp content, '\[([^]]+)]\s+([^}]+})\s+(.*)' as time,json,stack | parse-json json | project-away garbage,json,contentWhen applied in the iLogtail console, the preview shows the parsed fields and confirms that the unwanted columns are dropped, demonstrating a more concise and performant configuration compared with the multi‑plugin pipeline.
Open‑Source Configuration Example
enable: true
inputs:
- Type: input_file
FilePaths:
- /home/test-log/test.log
Multiline:
StartPattern: \[\d+.*
processors:
- Type: processor_spl
Script: '* | parse-regexp content, "\[([^]]+)]\s+([^}]+})\s+(.*)" as time,json,stack | parse-json json | project-away garbage,json,content'
flushers:
- Type: flusher_stdout
OnlyStdout: trueRunning this configuration against sample data yields correctly parsed fields, confirming SPL’s effectiveness.
Conclusion
iLogtail’s original pipeline model can handle complex logs but forces a trade‑off between performance and flexibility. SPL unifies syntax, delivers near‑native C++ performance, and simplifies configuration and debugging, making it a compelling choice for modern log‑processing workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
