Cloud Native 11 min read

Improving Kubernetes Cluster Utilization: Practices and Optimization Strategies

The session detailed how Tencent’s container experts boost Kubernetes cluster utilization by correcting pod resource requests, employing two‑level auto‑scaling, dynamic over‑commit, adaptive scheduling and eviction, and using HPA/EHPA/VPA, achieving up to 38.7% node usage and roughly 60% cost savings in real‑world workloads.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Improving Kubernetes Cluster Utilization: Practices and Optimization Strategies

The third session of the "Power Source x Cloud‑Native Voice: Cost‑Reduction and Efficiency‑Boosting Lecture" was jointly organized by the China Academy of Information and Communications Technology, Tencent Cloud, and the FinOps industry standards working group. Tencent Cloud container technology expert Song Xiang presented the topic "Kubernetes Cluster Utilization Improvement Practice".

Song Xiang, who leads the design of the large‑scale self‑developed container platform TKEx at Tencent, introduced how Docker, Kubernetes, Istio and other cloud‑native technologies have been applied to massive services such as QQ, online education, and Tencent Meeting.

Background

Low cluster utilization leads to high costs. Inefficient configuration of clusters and applications creates a vicious cycle that prevents continuous improvement of resource usage.

Root Causes of Low Utilization

Cluster administrators cannot accurately assess the required cluster size, resulting in oversized node buffers and resource redundancy.

Users set unreasonable Pod resource requests and limits, causing low actual usage while the scheduler believes the cluster is full.

Two typical scenarios are examined:

Scenario 1 – Pod Resource Setting

Pods can be classified into three QoS classes:

BestEffort : No request or limit is set; rarely used in production.

Guaranteed : Request equals limit; provides strong stability but does not improve utilization.

Burstable : Request < Limit; adjusting these values can raise actual utilization.

Scenario 2 – Node Allocation of Pod Resources

Nodes reserve part of their capacity for the OS and Kubelet, leaving only a fraction for Pods. Fragmented resources that cannot satisfy pending Pods cause waste.

Common Cluster Load‑Optimization Ideas

User‑Side Optimizations

Adjust Pod requests/limits based on historical data or Crane recommendations.

Use appropriate HPA settings; apply Crane‑enhanced EHPA algorithms.

Configure CronHPA for predictable traffic spikes (e.g., weekly vaccine‑booking peaks).

Apply VPA according to user policies.

Platform‑Side Optimizations

Two‑Level Auto‑Scaling : HNA triggers node scaling when cluster packing reaches a threshold (e.g., 85%); Super‑Node (Virtual Kubelet) handles sudden burst workloads.

Two‑Level Dynamic Over‑commit : Combine Pod compression (dynamic request/limit ratio) with node over‑commit ratios that adapt to real‑time load.

Dynamic Scheduling : Extend NodeScorer evaluates nodes with time‑varying metrics, allowing the scheduler to place Pods on nodes with lower actual memory usage.

Dynamic Eviction : When node load is uneven or too high, pods are evicted based on multi‑dimensional metrics (CPU, memory, file descriptors, inode, etc.) to protect service stability.

Case Study – Tencent Medical Service

The medical service experienced rapid traffic growth on weekday mornings and weekly vaccine‑booking events. By setting HPA ranges (2‑30 instances) and triggering scaling when CPU usage reaches 50% of the limit, cost was reduced by about 60% compared with a static configuration.

Results

After applying pod compression, node over‑commit, dynamic scheduling and eviction, the average node utilization rose to 38.7 % and the variance of node load decreased significantly, leading to more balanced resource consumption and higher overall efficiency.

In summary, the talk shared practical experiences from Tencent’s large‑scale production environment, covering two‑level scaling, dynamic over‑commit, dynamic scheduling, and dynamic eviction as effective ways to improve Kubernetes cluster utilization.

Cloud NativeKubernetesresource optimizationauto scalingCluster UtilizationPod scheduling
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.