Boosting SLS SQL: 3× Faster Queries on Trillion‑Row Logs
Alibaba Cloud’s Serverless Log Service (SLS) has overhauled its SQL engine with a C++‑based compute engine, SIMD acceleration, storage‑compute fusion, and optimized scheduling, delivering up to three‑fold speed gains, 50% latency reduction, and significant improvements across high‑cardinality, JSON, IP, and join queries.
Background
SQL is the core query capability of Alibaba Cloud Serverless Log Service (SLS), handling workloads from quick alerts to trillion‑row report‑level queries. As a serverless service, SLS must balance performance, isolation, and stability.
Engine Upgrade Highlights
Compute engine rewritten in C++ to fully exploit CPU SIMD instructions.
Compute‑storage fusion merges read‑only storage into the same process, cutting data conversion and copy overhead.
Pipeline model supports fine‑grained parallelism, unlocking multi‑core CPU potential.
Scheduler upgraded for balanced, stable task distribution, reducing data skew and leveraging affinity and multi‑level caching.
Improved distributed execution plans optimize COUNT(DISTINCT) and other high‑cardinality aggregations.
Incremental computation reuses historic partial results, processing only new data.
Introduced an Alibaba‑developed cache component that adaptively caches columnar data, reducing direct I/O.
High‑frequency functions (e.g., ip, json) see several‑ to tens‑fold speedups.
Cross‑project, cross‑region logstore queries are now supported (StoreView).
New Architecture
The upgraded architecture places QueryClient as the query proxy handling request intake, load balancing, and result caching. Coordinator manages overall SQL concurrency control and plan scheduling. The system adopts a compute‑storage‑separated design, while read‑only workers run compute and storage in the same process to minimize data movement. The new engine delivers roughly a 3× improvement in compute performance.
Overall Performance Gains
After the upgrade, average query latency dropped by about 50%, overall throughput roughly doubled, and query spikes were dramatically reduced.
Typical Scenario Improvements
1. Trillion‑row single‑column aggregation in seconds
100 billion rows aggregated in 1.46 s. For a trillion rows with enhanced SQL mode, the same aggregation completes in 15 s, and with optimal parallelism under 10 s.
2. Incremental computation reuse
Initial query over a time window finishes in ~1.5 s. Expanding the time range by 10 minutes reuses previous results and completes in 400 ms.
3. JSON function performance boost
Processing json_extract_scalar on 170 million rows drops from 34.9 s (old engine) to 5.8 s (new engine), a >6× improvement.
4. IP function speedup
Subnet calculation on 1 billion rows improves from 20 s to under 1 s (≈20×). IP‑database lookup on 8.7 billion rows improves from 24 s to 2 s (≈12×).
5. High‑cardinality aggregation
Numeric high‑cardinality aggregation on 200 billion rows drops from 17.7 s to 1.8 s. Character‑based high‑cardinality aggregation on 2 billion distinct values improves from ~ 40 s to 12 s (further reduced to 6.2 s with optimal parallelism).
6. Multi‑column aggregation
Aggregating two columns on 100 billion rows falls from 27.5 s (old) to 6.5 s (new), a ~4× speedup with accurate results.
7. Multi‑table join performance
Join‑based comparison function execution drops from 3 s to 560 ms, a ~5× improvement.
Game Operations Use Cases
1. Business monitoring and alerts
event:register | select __time__ - __time__ % 60 as time,
serverId as "区服Id",
count(*) as "注册数"
group by time,serverId
having "注册数" > 5000
order by "注册数" descThis query detects sudden spikes in registrations per server, enabling rapid scaling or traffic diversion.
2. PV/UV same‑period comparison
* | select diff[1] as today, round((diff[3]-1.0)*100, 2) as growth
from (
select compare(pv, 86400) as diff
from (select count(distinct remote_addr) as pv from log)
)3. PV/UV year‑over‑year comparison
* | select t, diff[1] as today, diff[2] as yestoday, diff[3] as percentage
from (
select t, compare(pv, 86400) as diff
from (
select count(1) as pv, date_format(from_unixtime(__time__), '%H:%i') as t
from log group by t
)
group by t order by t
)4. Federated queries with MySQL external tables
-- sls_join_meta_store is an external table (MySQL or OSS)
* | select case gender when 1 then '男性' else '女性' end as gender,
count(1) as pv
from log l join sls_join_meta_store u on l.userid = u.uid
group by gender order by pv descFuture Features
Filter push‑down: complex predicates (LIKE, regex) are pushed to the storage layer, reducing unnecessary I/O.
Fully precise mode: enhanced isolation and flow‑control guarantee exact results over the selected time range.
References
StoreView documentation: https://help.aliyun.com/zh/sls/user-guide/dataset-storeview-overview
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
