Cloud Native 17 min read

How SPL’s High‑Performance Mode Transforms Log Query at Scale

This article explains how the SLS Processing Language (SPL) combines pipeline syntax with SQL‑like operators, introduces a high‑performance mode that pushes computation to storage nodes and uses vectorized processing, and demonstrates sub‑second query times on billions of log entries while supporting rich filtering, histogram visualization, and random paging.

Alibaba Cloud Observability

Aug 15, 2024

How SPL’s High‑Performance Mode Transforms Log Query at Scale

Introduction

Observability relies heavily on log data; collecting logs to the cloud and extracting valuable information from massive, unstructured logs is increasingly challenging.

Characteristics of Log Data

Immutable: logs are write‑once records.

Random: events such as errors or user actions are unpredictable.

Diverse sources: logs come from many systems with varying schemas.

Complex business logic: logging cannot anticipate all future analysis needs.

Because of these traits, logs are usually stored in a “schema‑on‑read” fashion (the “Sushi Principle”). This requires dynamic, real‑time processing and exploratory analysis.

Query Scenarios

Two main scenarios exist: pure filtering queries that return raw log lines, and analytical queries that perform aggregation or joins. The article focuses on the filtering scenario.

Limitations of Traditional Search Syntax and SQL

Search syntax is simple but cannot express complex logic; SQL is expressive but returns tabular results and struggles with raw log fields.

Proposed Solution: SPL (SLS Processing Language)

SPL combines a pipeline‑style syntax with the rich operators of SQL, allowing direct processing of raw logs. <data-source> | <spl-expr> ... | <spl-expr> ... Each <spl‑expr> can perform regex extraction, field projection, calculations, etc.

Key SPL Capabilities

Field projection: project or project-away to keep or drop fields.

Real‑time calculations: extend to create new fields, e.g. Status:200 | extend urlParam=split_part(Uri, '/', 3).

Complex expressions: casting, arithmetic, and subsequent where filtering, e.g.

Status:200 | extend timeRange = cast(BeginTime as bigint) - cast(EndTime as bigint)

Parsing semi‑structured data: parse-json, parse-csv to expand nested fields.

High‑Performance SPL Mode

To overcome the performance limits of the scan‑based mode, several optimizations were introduced:

Computation push‑down: where‑clauses are evaluated on storage shards using a C++ vectorized engine, reducing data transfer.

Vectorized processing and SIMD acceleration on each shard.

Early termination when enough results are found.

These changes enable horizontal scaling and dramatically lower latency.

Performance Evaluation

Tests on a logstore with 10 shards (≈1 billion rows) show that high‑performance SPL processes queries in tens to a few thousand milliseconds depending on hit rate.

Hit rate 1%: Scenario 1 = 52 ms, Scenario 2 = 73 ms, Scenario 3 = 89 ms.

Hit rate 0.1%: Scenario 1 = 65 ms, Scenario 2 = 94 ms, Scenario 3 = 126 ms.

Hit rate 0.01%: Scenario 1 = 160 ms, Scenario 2 = 206 ms, Scenario 3 = 586 ms.

Hit rate 0.001%: Scenario 1 = 1301 ms, Scenario 2 = 2185 ms, Scenario 3 = 3074 ms.

Hit rate 0.0001%: Scenario 1 = 2826 ms, Scenario 2 = 3963 ms, Scenario 3 = 6783 ms.

Higher hit rates yield near‑keyword performance; very low hit rates require more computation but still finish within seconds for billions of rows.

Interactive Improvements

The console now shows a histogram of the final filtered results, supports random paging based on filtered offsets, and provides a unified API where offset refers to the filtered result set.

Best Practices

Use indexed keyword filters as the first pipeline stage.

Prefer SPL over SQL for fuzzy, phrase, regex, or JSON extraction scenarios.

Ensure fields used in where have indexes and statistics to trigger high‑performance mode.

Future Directions

Upcoming SPL features will add sorting and aggregation, outputting results in tabular form while retaining the powerful pipeline model.

Conclusion

SPL provides a flexible, high‑performance query language for cloud‑native log analysis, combining the simplicity of search syntax with the power of SQL operators, and delivering sub‑second query times on massive log datasets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

observability SPL vectorized computing high performance query

Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.