Cloud Native 12 min read

How the New SLS SQL Engine Boosts Big Data Queries by Up to 10×

Alibaba Cloud’s SLS SQL engine has been completely rebuilt, leveraging C++ SIMD, compute‑storage fusion, fine‑grained parallel pipelines, and advanced caching, delivering up to three‑fold raw performance gains, halving latency, and dramatically accelerating high‑cardinality, incremental, and join queries across trillion‑row log datasets.

Alibaba Cloud Observability

Dec 24, 2024

How the New SLS SQL Engine Boosts Big Data Queries by Up to 10×

SQL Engine Major Upgrade

SQL is a core feature of SLS, handling massive log‑analysis workloads from quick alerts to trillion‑row reports. Over the past year the SLS SQL team rebuilt the engine, improving execution speed, isolation, and stability.

Key Improvements

Engine rewritten in C++ to fully exploit CPU SIMD instructions.

Compute‑storage fusion merges read‑only storage and computation into a single process, reducing data conversion and copy overhead.

Pipeline model supports fine‑grained parallelism, unlocking multi‑core CPU potential.

Scheduler upgraded for balanced, stable task distribution, reducing data skew and leveraging affinity and multi‑level caches.

Optimized distributed execution plans for high‑cardinality aggregations and multiple COUNT(DISTINCT) scenarios.

Incremental computation reuses previous partial results, processing only new data.

Custom cache component adaptively caches columnar data, cutting direct I/O.

High‑frequency functions (e.g., ip, json) see several‑ to tens‑fold speedups.

Cross‑project and cross‑region logstore queries supported (see StoreView).

New Architecture

QueryClient acts as a query proxy handling request entry, load balancing, and result caching. Coordinator manages overall SQL concurrency and planning. The system adopts a compute‑storage‑separated design, while read‑only workers run compute and storage in the same process to minimize data movement. The upgraded engine delivers roughly three times the previous compute performance.

Overall Performance Gains

In production clusters, average query latency dropped by about 50%, effectively doubling overall throughput and significantly reducing query spikes.

Typical Scenario Improvements

Single‑column aggregation on 100 billion rows finishes in 1.46 s; on 1 trillion rows with enhanced mode in 15 s, and can be reduced to 10 s with proper parallelism.

Incremental computation re‑uses historic results, completing a 10‑minute window query in ~1.5 s and a 20‑minute window in 400 ms.

JSON function performance improved over six‑fold: processing 170 million rows drops from 34.9 s to 5.8 s.

IP function speedup exceeds 10×: 1 billion rows processed in 20 s (old) vs 1 s (new).

High‑cardinality aggregations: 200 billion rows with 7.68 million distinct values reduced from 17.7 s to 1.8 s; 20 billion distinct strings reduced from ~40 s to 12 s (further to 6.2 s with optimal parallelism).

Multi‑column aggregation on 1000 billion rows cut from 27.5 s to 6.5 s.

Multi‑table join performance improved: a compare‑based join dropped from 3 s to 560 ms.

Game Operations Use Cases

The upgraded engine enables fast, serverless analytics for game logs without additional data‑warehouse components.

1. Business monitoring and alerting

event:register | select __time__ - __time__ % 60 as time, serverId as "区服Id", count(*) as "注册数" group by time, serverId having "注册数" > 5000 order by "注册数" desc

2. PV/UV ring‑ratio monitoring with compare function

* | select diff[1] as today, round((diff[3]-1.0)*100, 2) as growth from ( select compare(pv, 86400) as diff from (select count(distinct remote_addr) as pv from log) )

3. Building dashboards and reports

SQL can drive daily business dashboards and operational reports directly from log data.

4. Federated queries with MySQL external tables

-- sls_join_meta_store is an external MySQL/OSS table
* | select case gender when 1 then '男性' else '女性' end as gender, count(1) as pv from log l join sls_join_meta_store u on l.userid = u.uid group by gender order by pv desc

All tests were conducted in a production environment using simulated data; actual results may vary based on data distribution, shard count, and cluster size.

Future enhancements will include filter push‑down to the storage layer and a fully precise mode with stronger isolation and QoS guarantees.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Log Analytics

Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.