Tagged articles

35 articles

Page 1 of 1

Apr 18, 2026 · Operations

How to Build a Resilient GPU Inference Autoscaling System on Kubernetes

This article explains why scaling GPU inference services on Kubernetes is challenging and presents a multi‑layer control architecture, metric upgrades, and production‑ready implementations using HPA, KEDA, KServe, and Karpenter to achieve stable, cost‑effective autoscaling.

GPUHPAInference

0 likes · 29 min read

How to Build a Resilient GPU Inference Autoscaling System on Kubernetes

Raymond Ops

Jan 23, 2026 · Cloud Native

How to Triple Kubernetes Performance: End‑to‑End Node‑to‑Pod Tuning Guide

This article walks through a systematic, bottom‑up performance tuning process for Kubernetes clusters—covering kernel parameters, container runtime, kubelet, scheduler, and pod resource settings—backed by a real‑world e‑commerce case study that reduced latency by over 80% and cut OOM events by 97.5%.

HPAKubernetesNode Optimization

0 likes · 12 min read

How to Triple Kubernetes Performance: End‑to‑End Node‑to‑Pod Tuning Guide

Ray's Galactic Tech

Nov 21, 2025 · Cloud Native

Mastering Kubernetes HPA: How It Works, Real‑World Setup, and Troubleshooting

Horizontal Pod Autoscaler (HPA) in Kubernetes automatically scales pod replicas based on metrics like CPU, memory, or custom indicators, and this guide explains its core principles, configuration pitfalls, step‑by‑step troubleshooting commands, and advanced considerations such as API versions, stabilization windows, and integration with Cluster Autoscaler.

HPAKubernetesautoscaling

0 likes · 9 min read

Mastering Kubernetes HPA: How It Works, Real‑World Setup, and Troubleshooting

Ray's Galactic Tech

Nov 21, 2025 · Cloud Native

Mastering Kubernetes Deployments for High‑Availability Online Services

This guide explains why Deployments are essential in Kubernetes, walks through a full production‑grade YAML, and covers replica control, rolling updates, health probes, anti‑affinity, scaling, and rollback best practices for resilient cloud‑native applications.

DeploymentHPAKubernetes

0 likes · 7 min read

Mastering Kubernetes Deployments for High‑Availability Online Services

MaGe Linux Operations

Nov 6, 2025 · Cloud Native

Master Kubernetes Node Autoscaling with Custom Prometheus Metrics in 30 Minutes

This guide walks you through a complete, 30‑minute implementation of Kubernetes node autoscaling using Horizontal Pod Autoscaler (HPA) with custom Prometheus metrics, covering prerequisites, anti‑pattern warnings, environment matrix, step‑by‑step deployment, core principles, observability, troubleshooting, best practices, and FAQ.

HPAKubernetesPrometheus

0 likes · 50 min read

Master Kubernetes Node Autoscaling with Custom Prometheus Metrics in 30 Minutes

Alibaba Cloud Infrastructure

Nov 3, 2025 · Cloud Computing

How ACK One Fleet Enables Scalable AI Workloads with Multi‑Cluster GPU Scheduling

ACK One Fleet, Alibaba Cloud's enterprise multi‑cluster solution, provides inventory‑aware elastic GPU scheduling, cross‑region resource sharing, multi‑cluster HPA and model distribution, allowing AI inference and training workloads to scale efficiently, reduce costs, and maximize GPU utilization.

AIGPU schedulingHPA

0 likes · 12 min read

How ACK One Fleet Enables Scalable AI Workloads with Multi‑Cluster GPU Scheduling

Ops Development Stories

Sep 4, 2025 · Cloud Native

Why Kubernetes HPA Ignores High CPU Usage and How Tolerance Affects Scaling

This article explains the internal architecture and source‑code flow of Kubernetes Horizontal Pod Autoscaler, detailing how components like HorizontalController and ReplicaCalculator compute desired replicas, why a default 10% tolerance can prevent scaling even when CPU exceeds the target, and how behavior policies and scaling limits influence HPA decisions.

Cloud NativeHPAHorizontal Pod Autoscaler

0 likes · 16 min read

Why Kubernetes HPA Ignores High CPU Usage and How Tolerance Affects Scaling

Full-Stack DevOps & Kubernetes

Aug 26, 2025 · Cloud Native

Mastering Kubernetes Resource Quotas and Pod Limits to Prevent Cluster Overload

This guide explains why resource limits are essential in Kubernetes, how to configure Namespace‑level ResourceQuota and Pod‑level Requests/Limits, and provides a practical case study with YAML examples to prevent a single service from exhausting cluster CPU and memory.

ClusterManagementHPAKubernetes

0 likes · 6 min read

Mastering Kubernetes Resource Quotas and Pod Limits to Prevent Cluster Overload

Full-Stack DevOps & Kubernetes

Jul 16, 2025 · Cloud Native

Mastering Kubernetes Service Deployment: From Docker Build to HPA

This guide walks you through the complete Kubernetes service deployment workflow, covering Docker image creation with multi‑stage builds, pushing to a registry, defining Deployment and Service resources, applying and monitoring them, managing configuration, implementing horizontal pod autoscaling, and integrating logging and monitoring solutions.

ConfigMapHPAKubernetes

0 likes · 8 min read

Mastering Kubernetes Service Deployment: From Docker Build to HPA

Raymond Ops

Dec 19, 2024 · Operations

How to Auto‑Scale Non‑CPU Apps with cAdvisor Network Metrics in Kubernetes

This guide explains how to use cAdvisor‑provided container network traffic counters as custom metrics for Kubernetes HPA, covering metric collection, Prometheus‑adapter configuration, verification, and a complete HPA testing workflow for elastic scaling of non‑CPU‑intensive workloads.

HPAKubernetesPrometheus

0 likes · 7 min read

How to Auto‑Scale Non‑CPU Apps with cAdvisor Network Metrics in Kubernetes

Linux Ops Smart Journey

Oct 11, 2024 · Cloud Native

Master Kubernetes HPA: Auto-Scale Pods Efficiently with Real-World Examples

This guide explains what Kubernetes Horizontal Pod Autoscaler (HPA) is, how it works, its key features, and provides step‑by‑step configuration, verification, and scaling policy details with practical code examples for cloud‑native applications.

DevOpsHPAKubernetes

0 likes · 10 min read

Master Kubernetes HPA: Auto-Scale Pods Efficiently with Real-World Examples

Open Source Linux

Sep 10, 2024 · Cloud Native

Mastering Kubernetes Deployments: From YAML Generation to Rolling Updates and HPA

This guide walks through Kubernetes Deployment controllers, showing how to generate YAML templates, manage replica counts, apply dynamic scaling with Horizontal Pod Autoscaler, and perform rolling image upgrades and rollbacks, all with practical command‑line examples.

DeploymentDevOpsHPA

0 likes · 16 min read

Mastering Kubernetes Deployments: From YAML Generation to Rolling Updates and HPA

MaGe Linux Operations

Mar 16, 2024 · Cloud Native

Scaling Non‑CPU‑Bound Apps with HPA Using cAdvisor Network Metrics

This guide shows how to enable Horizontal Pod Autoscaling for traffic‑driven workloads by leveraging cAdvisor's container network receive and transmit byte counters, converting them to per‑second rates with Prometheus‑adapter, and validating the custom metric through Kubernetes commands and console views.

Cloud NativeHPAKubernetes

0 likes · 7 min read

Scaling Non‑CPU‑Bound Apps with HPA Using cAdvisor Network Metrics

Full-Stack DevOps & Kubernetes

Mar 6, 2024 · Cloud Native

Master Kubernetes HPA: Automatic Pod Scaling with Real‑World Examples

This article explains how to configure Kubernetes Horizontal Pod Autoscaler (HPA) for automatic pod scaling, covering core concepts, metric selection, and two detailed YAML examples that demonstrate scaling based on CPU utilization and custom data‑processing rates.

Auto ScalingCloud NativeDevOps

0 likes · 6 min read

Master Kubernetes HPA: Automatic Pod Scaling with Real‑World Examples

vivo Internet Technology

Dec 20, 2023 · Cloud Native

Resource Overcommit Strategies in Vivo Container Platform: Static and Dynamic Approaches

Vivo’s container platform combats oversized resource requests by first applying static coefficient‑based overcommit at deployment and then using a dynamic recommender that continuously gathers usage metrics, builds exponential histograms with a half‑life sliding‑window model, and adjusts CPU (and optionally memory) requests, improving packing efficiency, reducing billing, and boosting CPU utilization by up to eight percent while maintaining HPA accuracy.

HPAKubernetesResource Overcommit

0 likes · 15 min read

Resource Overcommit Strategies in Vivo Container Platform: Static and Dynamic Approaches

Tencent Cloud Developer

Aug 16, 2023 · Cloud Native

Migrating QQ Image Service to Tencent Cloud Native (TKE): Architecture, Optimization, and Lessons Learned

The QQ image storage platform was fully migrated from VM‑based servers to Tencent Cloud’s Kubernetes Engine, consolidating services into containers, adding health checks, anti‑affinity, and autoscaling, which cut costs by 26%, reduced ops effort 30%, and improved scalability and reliability.

AVIFHPAImage Processing

0 likes · 15 min read

Migrating QQ Image Service to Tencent Cloud Native (TKE): Architecture, Optimization, and Lessons Learned

dbaplus Community

Jun 24, 2023 · Operations

How Bilibili Scales Capacity: VPA, HPA, and Cost‑Saving Strategies

This article summarizes Zhang He’s Bilibili SRE talk on building a capacity‑management system that visualizes resource usage, reduces costs, improves stability, and leverages Kubernetes VPA, HPA, pooling, and quota management to support massive live‑stream events and rapid feature releases.

Cost OptimizationHPAKubernetes

0 likes · 21 min read

How Bilibili Scales Capacity: VPA, HPA, and Cost‑Saving Strategies

政采云技术

Mar 2, 2023 · Cloud Computing

Kubernetes Horizontal Pod Autoscaler (HPA) and KEDA: Principles, Limitations, and Implementation

This article explores Kubernetes horizontal pod autoscaling mechanisms, comparing HPA and KEDA, their implementation principles, limitations, and practical deployment scenarios for cloud-native applications.

Cloud NativeHPAKEDA

0 likes · 14 min read

Kubernetes Horizontal Pod Autoscaler (HPA) and KEDA: Principles, Limitations, and Implementation

政采云技术

Feb 28, 2023 · Cloud Native

Understanding Horizontal Pod Autoscaler (HPA) and KEDA for Elastic Scaling in Kubernetes

This article explains pod‑level elasticity in Kubernetes by detailing the principles, metric types, and limitations of the Horizontal Pod Autoscaler (HPA) and then introduces KEDA as an event‑driven extension that adds true scale‑to‑zero capabilities, complete with configuration examples and code snippets.

HPAKEDAKubernetes

0 likes · 17 min read

Understanding Horizontal Pod Autoscaler (HPA) and KEDA for Elastic Scaling in Kubernetes

HelloTech

Dec 23, 2022 · Cloud Native

Design Principles and Implementation Details of Kubernetes Horizontal Pod Autoscaler and Custom Water Pod Autoscaler

The article explains Kubernetes’ built‑in Horizontal Pod Autoscaler, then details the custom Water Pod Autoscaler (WPA) that extends HPA with dual‑signal (load and SOA registration) detection, dual‑threshold scaling, noise filtering, configurable cooldown, frequency limits, tolerance buffers, and integrated alerting for reliable elastic scaling.

Cloud NativeHPAKubernetes

0 likes · 13 min read

Design Principles and Implementation Details of Kubernetes Horizontal Pod Autoscaler and Custom Water Pod Autoscaler

Alibaba Cloud Developer

Dec 23, 2022 · Cloud Native

What Happens When You Deploy an App on Kubernetes? A Deep Dive

This article walks through the entire lifecycle of deploying an application on Kubernetes, explaining how Docker containers differ from virtual machines, the role of Pods, ReplicationControllers, Deployments, and how automatic scaling with HPA and VPA keeps services reliable and efficient.

Cloud NativeDeploymentHPA

0 likes · 21 min read

What Happens When You Deploy an App on Kubernetes? A Deep Dive

Tencent Cloud Developer

Nov 24, 2022 · Cloud Native

Large‑Scale Cost Optimization for Kubernetes/TKE: Data Collection, Measures, and Implementation

The article details a Tencent‑led, end‑to‑end cost‑optimization project for large‑scale Kubernetes/TKE clusters that collected extensive workload metrics, applied VPA/HPA enhancements, custom scheduling and node‑downscaling via the open‑source Crane platform, ultimately delivering up to 70% CPU and 50% memory savings with zero‑fault deployments.

HPAKubernetesResource Management

0 likes · 29 min read

Large‑Scale Cost Optimization for Kubernetes/TKE: Data Collection, Measures, and Implementation

Efficient Ops

Nov 2, 2022 · Cloud Native

Why Your HPA Isn’t Scaling: 3 Common Misconceptions and How to Fix Them

This article explains three frequent misunderstandings about Kubernetes Horizontal Pod Autoscaler—dead zones, misuse of utilization calculations, and perceived lag in scaling—while detailing HPA’s inner workings, metric sources, calculation methods, and behavior configuration to help you avoid scaling pitfalls.

HPAKubernetesautoscaling

0 likes · 12 min read

Why Your HPA Isn’t Scaling: 3 Common Misconceptions and How to Fix Them

Huolala Tech

Oct 20, 2022 · Cloud Native

How Huolala Cuts Cloud Costs with Kubernetes: Spot Instances, Smart Autoscaling, and Predictive Scaling

This presentation details Huolala's end‑to‑end cloud‑native cost‑optimization strategy, covering the company's infrastructure basics, Kubernetes‑based server cost‑saving techniques, a tailored optimization roadmap, practical Spot Instance usage, and a custom CronHPA‑driven scheduled scaling solution to boost resource utilization.

Cloud NativeCost OptimizationHPA

0 likes · 23 min read

How Huolala Cuts Cloud Costs with Kubernetes: Spot Instances, Smart Autoscaling, and Predictive Scaling

Cloud Native Technology Community

Jul 12, 2022 · Cloud Native

How Tencent Cut Kubernetes CPU Costs by 70%: A Full‑Scale Cloud‑Native Optimization Journey

This article presents a comprehensive, data‑driven case study of how Tencent’s internal Kubernetes/TKE platform reduced monthly CPU usage by up to 70% and memory usage by 50% through systematic cost data collection, VPA/HPA enhancements, custom scheduling, node‑level over‑commit, and safe node decommissioning, while maintaining zero‑incident reliability.

Cloud NativeCost OptimizationHPA

0 likes · 28 min read

How Tencent Cut Kubernetes CPU Costs by 70%: A Full‑Scale Cloud‑Native Optimization Journey

Qunar Tech Salon

Jun 22, 2022 · Operations

Design and Implementation of Multi‑Cluster HPA Metrics Collection, Analysis, and Reporting in Kubernetes

This article explains the background, benefits, and measurement criteria of Kubernetes Horizontal‑Pod‑Autoscaler (HPA), describes the creation of metric tables and SQL queries for collecting scaling events and CPU usage, and presents a Python‑based workflow that aggregates the data, stores daily reports, validates results, and sends automated email summaries.

HPAKubernetesOperations

0 likes · 19 min read

Design and Implementation of Multi‑Cluster HPA Metrics Collection, Analysis, and Reporting in Kubernetes

Alibaba Cloud Native

May 5, 2022 · Cloud Native

Achieving Low‑Cost, High‑Elastic Kubernetes Deployments with ACK, ECI, and OpenKruise

This article explains how to use Kubernetes native autoscaling components—HPA, VPA, Cluster Autoscaler—and cloud‑native extensions such as Alibaba Cloud's Virtual Node, Elastic Container Instance, Elastic Workload, and the open‑source OpenKruise to build a cost‑effective, highly elastic architecture on ACK clusters.

Cluster AutoscalerElastic WorkloadHPA

0 likes · 28 min read

Achieving Low‑Cost, High‑Elastic Kubernetes Deployments with ACK, ECI, and OpenKruise

Cloud Native Technology Community

Dec 8, 2021 · Cloud Native

What’s New in Kubernetes 1.23? Top Features You Can Use Today

Kubernetes 1.23 GA adds over 45 enhancements—including dual‑stack networking, CronJobs, new ephemeral volume types, an updated HPA API, several deprecated APIs, beta‑graduated features, and new alpha capabilities—each described with configuration details and example manifests for immediate production use.

1.23Beta FeaturesCronJobs

0 likes · 6 min read

What’s New in Kubernetes 1.23? Top Features You Can Use Today

Qingyun Technology Community

Sep 8, 2021 · Cloud Native

How Knative Autoscaler Powers Serverless Scaling: KPA vs HPA Explained

This article explains the principles behind Knative Autoscaler, compares Knative Pod Autoscaler (KPA) with Kubernetes Horizontal Pod Autoscaler (HPA), and provides step‑by‑step configuration and demo instructions for achieving true serverless scaling on Kubernetes.

AutoscalerHPAKPA

0 likes · 7 min read

How Knative Autoscaler Powers Serverless Scaling: KPA vs HPA Explained

Node Underground

Aug 21, 2021 · Cloud Native

Mastering Autoscaling: HPA, VPA, and KNative KPA in Cloud‑Native Environments

This article reviews the current state of Kubernetes horizontal and vertical autoscaling, compares HPA, VPA, and KNative KPA, discusses their limitations, and proposes short‑ and long‑term ideas for a more dynamic, low‑ops scheduling system.

Cloud NativeHPAKnative

0 likes · 6 min read

Mastering Autoscaling: HPA, VPA, and KNative KPA in Cloud‑Native Environments

MaGe Linux Operations

Jul 11, 2021 · Cloud Computing

Master Kubernetes Autoscaling: HPA, VPA, and Cluster Autoscaler for Cost Savings

This article explains how Kubernetes' built‑in autoscaling mechanisms—Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and Cluster Autoscaler—work, when to use each, and best‑practice tips to reduce cloud costs while maintaining application performance.

Cluster AutoscalerCost OptimizationHPA

0 likes · 9 min read

Master Kubernetes Autoscaling: HPA, VPA, and Cluster Autoscaler for Cost Savings

Alibaba Cloud Native

Apr 5, 2021 · Cloud Native

How Knative Enables Traffic‑Based Autoscaling and Gray Deployments

This article explains Knative’s traffic‑driven autoscaling and gray‑release capabilities, detailing the request flow architecture, the roles of Service, Configuration, Route and Revision, and walks through built‑in scaling strategies such as KPA, HPA, scheduled‑HPA, event‑gateway and custom plugins, with practical examples.

Cloud NativeGray DeploymentHPA

0 likes · 10 min read

How Knative Enables Traffic‑Based Autoscaling and Gray Deployments

Alibaba Cloud Native

Jul 31, 2019 · Cloud Native

Master Kubernetes HPA: Hands‑On autoscaling/v1 and autoscaling/v2beta1 Practices

This guide walks you through configuring Kubernetes Horizontal Pod Autoscaler using both autoscaling/v1 (CPU‑only) and autoscaling/v2beta1 (custom metrics), covering template creation, deployment, Metrics Server migration, custom metrics adapter setup, load testing, and verification of scaling behavior.

HPAKubernetescloud-native

0 likes · 15 min read

Master Kubernetes HPA: Hands‑On autoscaling/v1 and autoscaling/v2beta1 Practices

Alibaba Cloud Native

Jul 24, 2019 · Cloud Native

How Does Kubernetes HPA Really Scale Pods? Deep Dive into Principles and Evolution

This article explains the core principles of Kubernetes Horizontal Pod Autoscaler, walks through a concrete scaling example, discusses noise handling, cooldown periods, boundary calculations, and traces the evolution of HPA across API versions with practical YAML snippets.

HPAHorizontal Pod AutoscalerKubernetes

0 likes · 10 min read

How Does Kubernetes HPA Really Scale Pods? Deep Dive into Principles and Evolution

Alibaba Cloud Native

Jul 17, 2019 · Cloud Native

Why Traditional Autoscaling Fails in Kubernetes and How Cloud‑Native Solutions Evolve

The article examines the limitations of traditional threshold‑based autoscaling in Kubernetes, explains three core challenges—percentage fragmentation, capacity‑planning pitfalls, and resource‑utilization dilemmas—then expands the autoscaling concept across four workload types and outlines the cloud‑native components that address them.

Cloud NativeHPAKubernetes

0 likes · 10 min read

Why Traditional Autoscaling Fails in Kubernetes and How Cloud‑Native Solutions Evolve