Big Data 13 min read

How SPL Boosts iLogtail 2.0: Combining Performance and Flexibility in Log Processing

This article traces the evolution of streaming processing languages, compares iLogtail's native and extended pipeline modes, and demonstrates how the new SPL syntax in iLogtail 2.0 delivers high‑performance, flexible log and time‑series data processing with unified, SQL‑like commands and interactive debugging tools.

Alibaba Cloud Observability
Alibaba Cloud Observability
Alibaba Cloud Observability
How SPL Boosts iLogtail 2.0: Combining Performance and Flexibility in Log Processing

Evolution of Stream Processing Languages

Early concepts of stream processing appeared in the 1970s with array‑oriented languages such as APL and the introduction of UNIX pipes, allowing command‑line chaining of output to input. Java added a Stream API in 2014, offering chainable, lazy, internal iteration for collections. Distributed frameworks like Apache Storm, Samza, Flink and Beam later provided sophisticated stream processing features such as event‑time handling and windowed computation, while Beam introduced a unified model for batch and stream processing. SQL‑style streaming query languages (Flink SQL, KSQL) enable complex logic using familiar syntax.

Log and time‑series data, typical semi‑structured sources, inspired languages like KQL (Kusto) and SPL (Splunk Processing Language) that emphasize intuitive search, powerful data manipulation, flexible analysis, real‑time and historical processing, scalability, and ease of use.

iLogtail Pipeline Modes

Native plugin mode parses logs using C++ splitter and parser components, supporting fixed formats (regex, JSON, delimiter) with high performance but limited flexibility.

Extended plugin mode forwards split logs to Golang plugins, allowing arbitrary plugin composition for complex scenarios at the cost of additional serialization overhead and reduced performance.

These modes force a trade‑off between flexibility and performance, and configuring multiple pipelines becomes cumbersome for diverse log formats.

Introducing SPL in iLogtail 2.0

iLogtail 2.0 adds SPL as a parallel processing mode, built on the SLS SPL library. SPL provides a unified, C++‑implemented operator set that approaches native performance while offering the expressive power of a SQL‑like streaming language.

SPL Syntax Overview

Command‑style statements with a pipe (|) for pipeline composition.

Structured data commands: extend to create new fields, where to filter rows.

Field operations: project, project-away, project-rename.

Unstructured extraction: parse-regexp, parse-json, parse-csv.

* | <data-source> | <spl-cmd> -option=<option> ... as <output>, ... | <spl-cmd> ...

Example of field extraction:

* | parse-regexp content, '\[([^]]+)]\s+([^}]+})\s+(.*)' as time,json,stack | parse-json json | project-away garbage,json,content

Advantages of iLogtail 2.0 + SPL

Unified syntax across iLogtail and real‑time consumption reduces configuration duplication.

C++ native operators deliver performance close to native plugins.

Full alignment with SLS SQL functions provides a rich function set.

Interactive SPL preview and intelligent suggestions simplify debugging.

Practical Example: Parsing Mixed JSON and Java Stack Logs

Original log line:

[2024-01-05T12:07:00.123456] {"message": "this is a msg", "level": "INFO", "garbage": "xxx"} java.lang.Exception: exception发生
  at com.aliyun.sls.devops.logGenerator.type.RegexMultiLog.f3(RegexMultiLog.java:130)
  ...

Pipeline configuration (multiline, regex, JSON, discard plugins) requires several steps and UI interactions.

SPL configuration achieves the same result with a concise statement:

* | parse-regexp content, '\[([^]]+)]\s+([^}]+})\s+(.*)' as time,json,stack | parse-json json | project-away garbage,json,content

The SPL preview in the console shows the fields time, message, level while removing unwanted data, demonstrating a simpler and faster workflow.

Open‑Source iLogtail SPL Configuration

enable: true
inputs:
  - Type: input_file
    FilePaths:
      - /home/test-log/test.log
    Multiline:
      StartPattern: \[\d+.*
processors:
  - Type: processor_spl
    Script: '* | parse-regexp content, ''\[([^]]+)]\s+([^}]+})\s+(.*)'' as time,json,stack | parse-json json | project-away garbage,json,content'
flushers:
  - Type: flusher_stdout
    OnlyStdout: true

Running this configuration parses the sample logs correctly, as shown by the console output.

Conclusion

SPL brings together high performance and flexible data manipulation for log processing in iLogtail 2.0, offering a unified, SQL‑like syntax, rich function support, and interactive debugging that significantly improve both configuration simplicity and runtime efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

iLogtailLog Analyticsdata pipelinesSPL
Alibaba Cloud Observability
Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.