Cloud Native 10 min read

How Mixed Workloads Boost Kubernetes CPU Utilization by Over 40%

This article explains how Youzan transformed its Kubernetes clusters from static over‑commit scheduling to load‑balanced mixed workloads using Koordinator and the Longxi kernel, achieving higher CPU utilization, lower costs, and better resource management for both online and offline services.

Youzan Coder
Youzan Coder
Youzan Coder
How Mixed Workloads Boost Kubernetes CPU Utilization by Over 40%

Background

As Youzan's business rapidly grows, demand for compute resources increases, putting pressure on cost control and supply. Meanwhile overall cluster resource utilization remains low, indicating room for improvement.

Main reasons:

Online services concentrate during daytime, leading to low CPU usage at night.

Offline jobs run at night, causing high CPU usage then and idle during day.

Burst traffic is over‑provisioned, reserving extra resources and lowering average CPU usage.

Static scheduling creates uneven node water‑levels, preventing further CPU gains.

With cloud‑native adoption, Kubernetes community introduced many mixed‑workload projects. Resource management shifted from static over‑commit to load‑balanced scheduling plus offline mixing. Since 2022, Youzan’s mixed‑workload clusters achieve about 40% average CPU utilization.

Solution Design

Based on Koordinator and the Longxi Linux kernel mixed‑workload architecture:

混部架构
混部架构

Scheduling layer:

Load‑balanced scheduling ensures new Pods land on idle nodes.

Rescheduling continuously evens node water‑levels as traffic changes.

Big‑data tasks (Spark ThriftServer) actively sense offline resource counts to avoid over‑use.

QoS guarantees:

Separate resource pools for online (LS) and offline (BE) levels.

Online auto‑scaling; offline timed eviction controller releases resources.

Offline CPU satisfaction eviction ensures task quality; memory pressure eviction protects nodes.

CPU throttling safeguards online service quality.

Asynchronous container memory reclamation with protection thresholds.

Cluster resource assurance:

Load‑based node scaling.

Longxi kernel asynchronous memory reclamation.

Strategy Evolution

Mixed workloads for online and offline services have progressed through node time‑sharing, load‑balanced scheduling, and steady‑state mixing.

Node Time‑Sharing

Cluster A mounts nodes from Cluster B via VK; controlling node and VK scheduling states drives the policy.

节点分时复用
节点分时复用

Periodic eviction of Pods on shared nodes and adjusting scheduling labels reuses nodes, improving offline resource utilization.

Load‑Balanced Scheduling

Using Koordinator’s load‑aware scheduling and precise application profiling, the online cluster shifts from static to dynamic water‑level‑based scheduling, raising average CPU water‑level from ~10% to ~25%.

Scheduling

Koordinator’s load‑aware and rescheduling capabilities move the cluster from static to node‑real‑load scheduling, unlike native Kubernetes which bases decisions on allocated resources. By considering historical load and estimating new Pods, it places Pods on less loaded nodes, achieving balanced node water‑levels and avoiding bottleneck nodes.

Rescheduling

Kubernetes may need to move running Pods due to uneven workload distribution, low overall utilization, or resource fragmentation.

Hotspot nodes overload affecting performance.

Desire to shut down under‑utilized nodes to cut costs.

Fragmentation prevents large Pods from scheduling despite sufficient total resources.

Application Profiling

After load‑balanced and rescheduling, node water‑level gaps improve, but peak loads can still create hotspots. Profiling identifies high‑load applications and pre‑emptively spreads them across nodes, mitigating hotspot formation.

业务应用画像
业务应用画像

Steady‑State Mixing

To reduce resource fragmentation and holding costs, workloads from dedicated Kubernetes clusters are merged into a large mixed‑workload cluster.

常态混部
常态混部

Offline Resource Pool Awareness

Big‑data task scheduler is refactored to dynamically sense Koordinator BE resource pool, controlling offline Pod creation scale.

离线感知离线资源池
离线感知离线资源池

Eviction

Active and passive eviction manage resource levels across time, ensuring service quality during peak periods.

驱逐
驱逐

Active Eviction

Online services auto‑scale; during low traffic, memory is released based on usage profiles.

Offline services use timed eviction controllers to remove idle Pods, freeing resources for online workloads.

Passive Eviction

In extreme cases (high memory usage, OOM risk, or sustained offline CPU shortage), offline eviction policies prioritize based on defined priority, resource consumption, and runtime, evicting services to optimize utilization.

Offline CPU satisfaction eviction moves tasks to nodes with sufficient resources.

Memory pressure eviction protects nodes from OOM.

Results

To date, Youzan’s mixed‑workload capability covers major business clusters, handling both online and offline scenarios. Large‑scale container mixing has yielded significant gains:

CPU utilization: online mixed clusters achieve daily average CPU utilization above 40% while maintaining service quality.

Resource cost: mixed clusters reduce costs by roughly 20% without compromising offline stability.

Event cost savings: during shopping festivals, online workloads temporarily use offline resources, avoiding extra scaling expenses.

cloud-nativebig dataKubernetesresource schedulingcpu-utilizationmixed workloadsKoordinator
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.