How StarRocks Achieves Fine-Grained Resource Isolation for Multi‑Tenant Workloads
StarRocks introduces user‑space scheduling with resource groups and classifiers to provide hard memory isolation, soft CPU/IO isolation, short‑query groups, concurrency limits, and large‑query circuit breaking, balancing isolation and utilization while supporting multi‑tenant workloads and future serverless scenarios.
Background and User Demands
Resource isolation has been a frequent discussion topic among StarRocks users. The main demands are: (1) guarantee core‑business query response time by limiting other tasks; (2) promptly break large queries before they waste resources; (3) isolate CPU/IO/memory among tenants and task types; (4) maintain high resource utilization and allow idle resource groups to be used by others.
Key Challenge
Balancing isolation and utilization is difficult: without isolation, clusters have good utilization but no protection; with full physical isolation, protection is strong but elasticity suffers.
Design Overview
We implement scheduling in user space, using logical resource groups and classifiers to achieve isolation while keeping elasticity.
Resource Groups and Classifiers
A resource group defines quotas for CPU, IO, memory, concurrency, etc., and can bind multiple classifiers. Classifiers match queries based on user, role, IP, SQL type, and database, then select the most suitable group using a three‑step priority algorithm (DB condition first, then the classifier with the most conditions, and finally the most specific condition when tied).
Isolation Capabilities
Hard memory isolation: a group cannot exceed its memory limit; queries that exceed the limit fail.
Soft CPU isolation: CPU time slices are allocated proportionally to cpu_core_limit (e.g., 1:3:4 across groups).
Soft IO isolation: IO time slices are allocated similarly to CPU.
Short‑query resource groups: provide hard isolation for high‑priority point queries, reserving CPU/IO resources.
Concurrency limit: each query is limited by concurrency_limit; excess concurrent requests are rejected.
Large‑query circuit breaker: queries exceeding CPU time, scanned rows, or memory thresholds are aborted.
Classification Logic
When a query is submitted, classifiers are evaluated in order: first those with a DB condition, then the classifier with the most conditions, and finally the most specific condition if ties occur.
Test Scenarios and Results
Three test suites were executed:
Two resource groups with a 2:1 CPU limit running identical workloads; QPS matched the 2:1 ratio.
Mixed small and large queries with groups 2:1; small‑query QPS remained proportional (2:3) despite concurrent large queries.
Short‑query resource group vs. a regular group (3:1); point‑query QPS matched the 3:4 ratio, confirming reserved resources.
Architecture
The overall architecture routes a query to the appropriate resource group, splits it into execution units, and dispatches them to BE nodes. Each BE node schedules tasks according to the group’s quotas, applying the soft/hard isolation mechanisms and circuit‑breaker checks.
Implementation Details
Memory tracking reuses StarRocks’ TCmalloc hook; each thread holds a MemoryTracker in thread‑local storage, forming a hierarchical tree that includes per‑group trackers.
CPU and IO soft isolation rely on the pipeline execution engine’s user‑space scheduler. A two‑level queue (group‑level priority queue and per‑group multi‑level feedback queue) approximates Linux CFS, using a Vruntime metric (actual time divided by cpu_core_limit) to ensure proportional sharing, avoid starvation, and provide a reward for idle groups.
Future Work
We are developing a task‑queuing and spill mechanism to prevent query failures when concurrency or memory limits are hit, adding support for more workload types (e.g., import, compaction), and planning multiple short‑query resource groups to cover additional user scenarios.
StarRocks
StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
