Cloud Native 28 min read

How Tencent Cut Kubernetes CPU Costs by 70%: A Full‑Scale Cloud‑Native Optimization Journey

This article presents a comprehensive, data‑driven case study of how Tencent’s internal Kubernetes/TKE platform reduced monthly CPU usage by up to 70% and memory usage by 50% through systematic cost data collection, VPA/HPA enhancements, custom scheduling, node‑level over‑commit, and safe node decommissioning, while maintaining zero‑incident reliability.

Cloud Native Technology Community
Cloud Native Technology Community
Cloud Native Technology Community
How Tencent Cut Kubernetes CPU Costs by 70%: A Full‑Scale Cloud‑Native Optimization Journey

Background

Tencent’s internal Kubernetes/TKE platform runs millions of pods and incurs tens of millions of RMB in monthly costs. A cost‑optimization case study reduced CPU usage by up to 70% and memory usage by 50%. The implementation was open‑sourced as the Crane project ( https://github.com/gocrane/crane).

Data Collection & Analysis

Cost‑related metrics were gathered at multiple levels:

Cost bills – total cost per product/module and trend.

Resource‑level – CVM node counts, CPU/Memory/Extended‑Resource totals, utilization, and request‑allocation ratios per region and cluster.

Pod‑level – requested vs. actual CPU/Memory, request‑usage efficiency, OOM occurrences.

HPA effectiveness – coverage, min/max replica settings, trigger history.

Business analysis – workload patterns, service types (stateless vs. stateful), and workload kinds (Deployment, StatefulSet, custom Operator).

Key findings:

≈80% of cost is from CVM nodes used by three major business groups.

Node CPU utilization averages 5% (peak 15%); node allocation rate ~55% with uneven load.

Pod request values far exceed actual usage; some pods OOM without auto‑scaling.

HPA coverage is low and replica settings are sub‑optimal.

Optimization Measures

Pod resource‑usage improvement : Deploy Vertical Pod Autoscaler (VPA) to align requests with real usage, extend HPA to all components, and use CronHPA for periodic workloads.

Node allocation rate improvement : Choose instance types matching the observed 1:4 CPU‑to‑Memory ratio, switch scheduler priority from LeastRequestedPriority to MostRequestedPriority, enlarge pod CIDR range, and use dynamic scheduler + Descheduler for load balancing.

Node load improvement : Apply an Admission Webhook to lower node‑level requests, enable over‑commit of extended resources for BestEffort pods, apply VPA‑driven right‑sizing, and allow burstable QoS with safe over‑commit thresholds.

Billing optimization : Select the most cost‑effective billing mode (spot, reserved, pay‑as‑you‑go) per workload and choose the best‑price instance types.

Industry Landscape & Solution Selection

The primary levers identified were VPA and HPA. The open‑source VPA architecture consists of Metrics Server, History Storage (usually Prometheus), VPA Controller (Recommender + Updater), and VPA Admission Controller. Limitations include performance at large scale, lack of custom metrics, slow response to spikes, and weak observability.

HPA is built into Kubernetes and supports multiple metric sources ( metrics.k8s.io, custom.metrics.k8s.io, external.metrics.k8s.io). A typical HPA manifest (autoscaling/v2) looks like:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Pods
    pods:
      metric:
        name: packets-per-second
      target:
        type: AverageValue
        averageValue: 1k
  - type: Object
    object:
      metric:
        name: requests-per-second
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: main-route
      target:
        type: Value
        value: 10k

HPA drawbacks include latency in reaction, limited observability, and lack of dry‑run support. Google Autopilot combines VPA and HPA concepts and demonstrates effective vertical scaling in Borg.

Design & Implementation

Design goals were extensibility, observability, and stability:

Support multi‑business namespaces via a pluggable ComponentProvider and custom portrait algorithms.

Expose detailed metrics (scaling counts, queue length, OOM events, component readiness, etc.).

Enable safe staged rollout: dry‑run → gray‑release → adaptive throttling → node decommission.

Portrait Module

Consists of a Workload‑Controller that watches Deployments, StatefulSets, Jobs, and CronJobs to generate Portrait custom resources, and a Workload‑Recommender that merges real‑time metrics (metrics‑server, OOM events) with historical data (Prometheus, Elasticsearch) using algorithms such as exponential‑decay histogram, XGBoost, and SMA.

KMetis Module

KMetis provides a unified VPA/HPA/EHPA service and node‑scale capability. Core API resources are CSetScaler (per‑namespace scaling policies) and NodeScaler (node decommission tasks). The scaling workflow:

Periodically inspect workloads against CSetScaler expectations.

Coordinate VPA first via ScalerProvider, ResourceEstimator, UpdaterProvider, and RecordProvider.

Coordinate HPA afterwards using ReplicasEstimator with conflict‑avoidance logic.

Perform root‑cause analysis on high‑load nodes before scaling.

KMetis also supports custom horizontal scalers such as Crane’s EHPA with predictive Dsp algorithms.

Deployment, Release Strategy & Results

Controlled release process :

Dry‑run mode collects prediction data without mutating workloads.

Gray‑release uncovers hidden issues at scale.

Adaptive throttling limits concurrent scaling actions (e.g., max 20 simultaneous updates).

Safe node shutdown uses custom affinity tags and the NodeScaler workflow.

During dry‑run the system identified a 1:4 CPU‑to‑Memory ratio, prompting migration from 8c16g to 4c16g instances for a core service. Subsequent gray‑release and throttling enabled tens of thousands of safe scaling actions.

Node‑safe‑drain is achieved with an affinity configuration such as:

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - preference:
        matchExpressions:
        - key: level
          operator: In
          values:
          - small
        - key: x.y.z/component
          operator: In
          values:
          - normal
      weight: 10

Combined with MostRequestedPriority, dynamic scheduler, and Descheduler, node allocation rose from ~50% to 99% CPU and 88% Memory, while average CPU utilization increased from 5% to 21.4%.

Effectiveness

Business A – 70% CPU reduction.

Business B – 45% CPU reduction.

Business C – 50% CPU reduction.

Overall cost dropped dramatically with zero incidents throughout the rollout.

Conclusion

Stability

Raising pod density required careful handling of kernel/Docker/Kubelet bugs and Service‑LB unbinding delays. Issues were mitigated with NodeProblemDetectorPlus, graceful termination scripts, and rolling‑update strategies.

Future Direction

Further CPU utilization gains are expected by extending node‑level over‑commit techniques.

This end‑to‑end case study demonstrates a reproducible methodology for large‑scale Kubernetes cost optimization, from data collection and algorithmic portrait generation to safe, observable, and automated scaling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeOperationsKubernetesCost OptimizationscalingHPAVPA
Cloud Native Technology Community
Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.