Big Data 12 min read

Boosting SLS SQL: 3× Faster Queries on Trillion‑Row Logs

Alibaba Cloud’s Serverless Log Service (SLS) has overhauled its SQL engine with a C++‑based compute engine, SIMD acceleration, storage‑compute fusion, and optimized scheduling, delivering up to three‑fold speed gains, 50% latency reduction, and significant improvements across high‑cardinality, JSON, IP, and join queries.

Alibaba Cloud Native

Dec 19, 2024

Boosting SLS SQL: 3× Faster Queries on Trillion‑Row Logs

Background

SQL is the core query capability of Alibaba Cloud Serverless Log Service (SLS), handling workloads from quick alerts to trillion‑row report‑level queries. As a serverless service, SLS must balance performance, isolation, and stability.

Engine Upgrade Highlights

Compute engine rewritten in C++ to fully exploit CPU SIMD instructions.

Compute‑storage fusion merges read‑only storage into the same process, cutting data conversion and copy overhead.

Pipeline model supports fine‑grained parallelism, unlocking multi‑core CPU potential.

Scheduler upgraded for balanced, stable task distribution, reducing data skew and leveraging affinity and multi‑level caching.

Improved distributed execution plans optimize COUNT(DISTINCT) and other high‑cardinality aggregations.

Incremental computation reuses historic partial results, processing only new data.

Introduced an Alibaba‑developed cache component that adaptively caches columnar data, reducing direct I/O.

High‑frequency functions (e.g., ip, json) see several‑ to tens‑fold speedups.

Cross‑project, cross‑region logstore queries are now supported (StoreView).

New Architecture

The upgraded architecture places QueryClient as the query proxy handling request intake, load balancing, and result caching. Coordinator manages overall SQL concurrency control and plan scheduling. The system adopts a compute‑storage‑separated design, while read‑only workers run compute and storage in the same process to minimize data movement. The new engine delivers roughly a 3× improvement in compute performance.

Overall Performance Gains

After the upgrade, average query latency dropped by about 50%, overall throughput roughly doubled, and query spikes were dramatically reduced.

Typical Scenario Improvements

1. Trillion‑row single‑column aggregation in seconds

100 billion rows aggregated in 1.46 s. For a trillion rows with enhanced SQL mode, the same aggregation completes in 15 s, and with optimal parallelism under 10 s.

2. Incremental computation reuse

Initial query over a time window finishes in ~1.5 s. Expanding the time range by 10 minutes reuses previous results and completes in 400 ms.

3. JSON function performance boost

Processing json_extract_scalar on 170 million rows drops from 34.9 s (old engine) to 5.8 s (new engine), a >6× improvement.

4. IP function speedup

Subnet calculation on 1 billion rows improves from 20 s to under 1 s (≈20×). IP‑database lookup on 8.7 billion rows improves from 24 s to 2 s (≈12×).

5. High‑cardinality aggregation

Numeric high‑cardinality aggregation on 200 billion rows drops from 17.7 s to 1.8 s. Character‑based high‑cardinality aggregation on 2 billion distinct values improves from ~ 40 s to 12 s (further reduced to 6.2 s with optimal parallelism).

6. Multi‑column aggregation

Aggregating two columns on 100 billion rows falls from 27.5 s (old) to 6.5 s (new), a ~4× speedup with accurate results.

7. Multi‑table join performance

Join‑based comparison function execution drops from 3 s to 560 ms, a ~5× improvement.

Game Operations Use Cases

1. Business monitoring and alerts

event:register | select __time__ - __time__ % 60 as time,
  serverId as "区服Id",
  count(*) as "注册数"
  group by time,serverId
  having "注册数" > 5000
  order by "注册数" desc

This query detects sudden spikes in registrations per server, enabling rapid scaling or traffic diversion.

2. PV/UV same‑period comparison

* | select diff[1] as today, round((diff[3]-1.0)*100, 2) as growth
from (
  select compare(pv, 86400) as diff
  from (select count(distinct remote_addr) as pv from log)
)

3. PV/UV year‑over‑year comparison

* | select t, diff[1] as today, diff[2] as yestoday, diff[3] as percentage
from (
  select t, compare(pv, 86400) as diff
  from (
    select count(1) as pv, date_format(from_unixtime(__time__), '%H:%i') as t
    from log group by t
  )
  group by t order by t
)

4. Federated queries with MySQL external tables

-- sls_join_meta_store is an external table (MySQL or OSS)
* | select case gender when 1 then '男性' else '女性' end as gender,
  count(1) as pv
from log l join sls_join_meta_store u on l.userid = u.uid
group by gender order by pv desc

Future Features

Filter push‑down: complex predicates (LIKE, regex) are pushed to the storage layer, reducing unnecessary I/O.

Fully precise mode: enhanced isolation and flow‑control guarantee exact results over the selected time range.

References

StoreView documentation: https://help.aliyun.com/zh/sls/user-guide/dataset-storeview-overview

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Big Data SQL Cloud Log Analytics

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.