Tagged articles

autoscaling

81 articles · Page 1 of 1

Jul 1, 2026 · Operations

When One Timeout Triggers a Platform‑Wide Outage

The article explains how unbounded retries, replication fan‑out, and naïve autoscaling can amplify a single timeout into a cascade of failures, and it proposes bounded retry policies, load‑aware scaling, and layered persistence as safeguards for reliable API‑centric systems.

autoscalingbounded retriesdistributed systems

0 likes · 12 min read

When One Timeout Triggers a Platform‑Wide Outage

Raymond Ops

Jun 14, 2026 · Cloud Native

How to Handle Traffic Spikes and Optimize Resources with Kubernetes HPA + VPA

This guide walks through the problem of fluctuating traffic in Kubernetes, explains the differences between Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA), and provides step‑by‑step commands, YAML examples, best‑practice recommendations, troubleshooting tips, and monitoring alerts for deploying a production‑grade HPA + VPA solution.

Cloud NativeHPAKubernetes

0 likes · 41 min read

How to Handle Traffic Spikes and Optimize Resources with Kubernetes HPA + VPA

Java Architect Essentials

Jun 9, 2026 · Cloud Native

Boost Spring Boot Service Availability to 99.9% with Smart K8s Probe Configurations

The article walks through common Kubernetes health‑probe pitfalls for Spring Boot services and presents a concrete set of liveness, readiness, graceful‑shutdown, autoscaling, and configuration‑separation techniques that together raise production availability to 99.9%, backed by real‑world incidents and code snippets.

Config ManagementGraceful ShutdownHealth Probes

0 likes · 8 min read

Boost Spring Boot Service Availability to 99.9% with Smart K8s Probe Configurations

Full-Stack DevOps & Kubernetes

Jun 1, 2026 · Cloud Native

Beyond Traditional HPA: AI‑Agent‑Driven Intelligent Autoscaling for Kubernetes Pods

The article analyzes the shortcomings of Kubernetes' native HPA and presents a comprehensive AI‑Agent architecture that predicts load, makes autonomous scaling decisions, and integrates with the K8s API to achieve proactive, adaptive, and globally coordinated pod autoscaling.

AI AgentCloud NativeHPA

0 likes · 16 min read

Beyond Traditional HPA: AI‑Agent‑Driven Intelligent Autoscaling for Kubernetes Pods

Alibaba Cloud Infrastructure

May 29, 2026 · Cloud Native

Alibaba Cloud Knative Gets a Major Upgrade to Fully Support AI Agents

Alibaba Cloud's Knative now integrates a dedicated Agent Sandbox workload type, enabling stateful AI agents to run in a serverless Kubernetes environment with per‑user isolation, automatic scaling, instant pause/resume, and warm‑pool pre‑warming for zero‑cost idle periods.

AI AgentAgent SandboxCloud Native

0 likes · 13 min read

Alibaba Cloud Knative Gets a Major Upgrade to Fully Support AI Agents

Huawei Cloud Developer Alliance

May 13, 2026 · Cloud Native

Why HPA Falls Short for LLMs and How Kthena Autoscaler Redefines Elastic Scaling

The article explains why traditional Kubernetes HPA cannot meet the unique demands of large‑language‑model inference, introduces Kthena Autoscaler’s model‑aware architecture, its dual stable/panic scaling modes, cost‑aware algorithms, flexible policy bindings, and provides practical configuration and observability guidance.

Kthena AutoscalerKubernetesLLM Inference

0 likes · 10 min read

Why HPA Falls Short for LLMs and How Kthena Autoscaler Redefines Elastic Scaling

Ray's Galactic Tech

Apr 18, 2026 · Operations

How to Build a Resilient GPU Inference Autoscaling System on Kubernetes

This article explains why scaling GPU inference services on Kubernetes is challenging and presents a multi‑layer control architecture, metric upgrades, and production‑ready implementations using HPA, KEDA, KServe, and Karpenter to achieve stable, cost‑effective autoscaling.

GPUHPAKEDA

0 likes · 29 min read

How to Build a Resilient GPU Inference Autoscaling System on Kubernetes

Amazon Cloud Developers

Mar 4, 2026 · Cloud Native

Boosting Resource Utilization with Event‑Driven Autoscaling for Caribbean Panda’s Game System

By integrating a Flask‑based custom metric service with KEDA on Amazon EKS, Caribbean Panda reduced its pod count from 360 to 36 during idle periods while still scaling rapidly during traffic spikes, achieving significantly higher resource efficiency and lower costs.

Amazon EKSCloud NativeKEDA

0 likes · 11 min read

Boosting Resource Utilization with Event‑Driven Autoscaling for Caribbean Panda’s Game System

MaGe Linux Operations

Jan 18, 2026 · Artificial Intelligence

How to Deploy Scalable LLM Inference on Kubernetes with GPU Autoscaling

This guide walks through building a production‑grade Kubernetes GPU cluster for large language model inference, covering hardware sizing, GPU resource scheduling, model storage options, automated scaling with HPA, health checks, monitoring, troubleshooting, and multi‑model deployment strategies.

DockerGPUKubernetes

0 likes · 49 min read

How to Deploy Scalable LLM Inference on Kubernetes with GPU Autoscaling

Alibaba Cloud Infrastructure

Dec 23, 2025 · Cloud Native

How Knative Serverless Cuts AI Inference Costs in Half and Doubles Efficiency

This article explains how the cloud‑native Knative serverless framework reduces GPU waste, enables request‑driven autoscaling to zero, accelerates AI model versioning and startup with Fluid, and integrates protocols like MCP and A2A to deliver cost‑effective, high‑performance AI inference services.

AI inferenceCloud NativeGPU

0 likes · 17 min read

How Knative Serverless Cuts AI Inference Costs in Half and Doubles Efficiency

dbaplus Community

Dec 22, 2025 · Cloud Computing

How We Cut Kubernetes Costs by 40% Without Switching Platforms

By rethinking resource requests, eliminating unused workloads, downsizing node types, fine‑tuning autoscaling, and trimming log storage, a team reduced their Kubernetes bill by 40% while keeping the same cloud provider, demonstrating that most cost overruns stem from misconfiguration rather than the platform itself.

Cloud ComputingKubernetesPrometheus

0 likes · 6 min read

How We Cut Kubernetes Costs by 40% Without Switching Platforms

Baidu Intelligent Cloud Tech Hub

Dec 15, 2025 · Artificial Intelligence

Baidu Baige’s Breakthrough: Orchestrating Giant LLM Inference with Silent Instances

The article details Baidu Baige’s next‑generation distributed inference platform for trillion‑parameter LLMs, explaining how automated orchestration, the FedDeployment abstraction, SplitService unified view, Adaptive HPA predictive scaling, Silent Instances for second‑level activation, and the Staggered Batched Scheduler eliminate scaling limits, reduce TTFT by 30‑40%, boost throughput by up to 20%, and achieve cost‑effective, elastic AI compute.

Distributed InferenceKubernetesLLM

0 likes · 23 min read

Baidu Baige’s Breakthrough: Orchestrating Giant LLM Inference with Silent Instances

Ray's Galactic Tech

Nov 21, 2025 · Cloud Native

Mastering Kubernetes HPA: How It Works, Real‑World Setup, and Troubleshooting

Horizontal Pod Autoscaler (HPA) in Kubernetes automatically scales pod replicas based on metrics like CPU, memory, or custom indicators, and this guide explains its core principles, configuration pitfalls, step‑by‑step troubleshooting commands, and advanced considerations such as API versions, stabilization windows, and integration with Cluster Autoscaler.

HPAKubernetesMetrics Server

0 likes · 9 min read

Mastering Kubernetes HPA: How It Works, Real‑World Setup, and Troubleshooting

MaGe Linux Operations

Nov 6, 2025 · Cloud Native

Master Kubernetes Node Autoscaling with Custom Prometheus Metrics in 30 Minutes

This guide walks you through a complete, 30‑minute implementation of Kubernetes node autoscaling using Horizontal Pod Autoscaler (HPA) with custom Prometheus metrics, covering prerequisites, anti‑pattern warnings, environment matrix, step‑by‑step deployment, core principles, observability, troubleshooting, best practices, and FAQ.

HPAKubernetesPrometheus

0 likes · 50 min read

Master Kubernetes Node Autoscaling with Custom Prometheus Metrics in 30 Minutes

IT Architects Alliance

Oct 19, 2025 · Cloud Native

Mastering Cloud‑Native Autoscaling: HPA, VPA, CA, and Cost‑Aware Strategies

This article explores the challenges and best practices of cloud‑native scaling, covering Horizontal and Vertical Pod Autoscalers, Cluster Autoscaler cost optimization, event‑driven scaling with KEDA, traffic‑aware scaling in service meshes, and intelligent cost‑aware strategies backed by monitoring and future AI‑driven trends.

KubernetesService Meshautoscaling

0 likes · 11 min read

Mastering Cloud‑Native Autoscaling: HPA, VPA, CA, and Cost‑Aware Strategies

Ops Community

Oct 8, 2025 · Cloud Native

How I Cut My Kubernetes Cloud Bill by 60% in 3 Months – Proven Strategies

Facing a 35‑million‑yuan monthly Kubernetes bill, the author analyzed hidden cost components, implemented five optimization campaigns—including resource request tuning, autoscaling, spot instances, storage tiering, and network consolidation—and reduced monthly expenses by 60% while boosting performance, delivering a detailed, reproducible methodology.

Cloud NativeFinOpsKubernetes

0 likes · 33 min read

How I Cut My Kubernetes Cloud Bill by 60% in 3 Months – Proven Strategies

IT Architects Alliance

Sep 7, 2025 · Cloud Native

Mastering Elastic Scaling on Kubernetes: Cut Costs While Handling Traffic Peaks

This article explains how to design elastic scaling architectures on cloud platforms—combining horizontal, vertical, and functional scaling, leveraging Kubernetes autoscaling features, predictive scaling, mixed instance strategies, and cost‑monitoring practices—to handle traffic spikes while minimizing expenses.

Cloud Cost OptimizationElastic Scalingautoscaling

0 likes · 9 min read

Mastering Elastic Scaling on Kubernetes: Cut Costs While Handling Traffic Peaks

Ops Development & AI Practice

Aug 30, 2025 · Cloud Native

Unlocking Karpenter NodePool: Fine‑Grained Autoscaling for Kubernetes

This article explains how Karpenter's NodePool CRD replaces traditional provisioners, details its core configuration fields, illustrates the autoscaling workflow from a pending pod to a ready node, and shows how to achieve cost‑effective, on‑demand resource provisioning in Kubernetes clusters.

AWSKarpenterKubernetes

0 likes · 9 min read

Unlocking Karpenter NodePool: Fine‑Grained Autoscaling for Kubernetes

MaGe Linux Operations

Aug 16, 2025 · Cloud Native

Master Container Deployment: Docker & Kubernetes Best Practices for Production

This comprehensive guide walks you through containerizing applications, optimizing Docker images, securing containers, designing Kubernetes high‑availability clusters, implementing observability with Prometheus and ELK, automating CI/CD pipelines, applying RBAC and network policies, and cutting costs with autoscaling and resource tuning, all backed by real‑world code examples.

CI/CDDockerKubernetes

0 likes · 20 min read

Master Container Deployment: Docker & Kubernetes Best Practices for Production

Cloud Native Technology Community

Jul 31, 2025 · Cloud Native

Cut Kubernetes Costs by 30%: Six Proven Automation Strategies

An analysis of recent Kubernetes cost benchmarks reveals chronic over‑provisioning, with up to 40% idle CPU and 57% idle memory, and offers six community‑validated, actionable automation techniques—including flexible instance selection, arm migration, custom autoscaling, bin‑packing, VPA, and safe Spot usage—to dramatically reduce cloud spend.

Kubernetesautoscalingcost optimization

0 likes · 8 min read

Cut Kubernetes Costs by 30%: Six Proven Automation Strategies

Rare Earth Juejin Tech Community

May 23, 2025 · Cloud Native

Master Kubernetes: From Core Concepts to Advanced Deployments and Autoscaling

This comprehensive guide walks you through Kubernetes fundamentals, cluster architecture, key components, resource objects, installation steps, and advanced features such as Ingress, RBAC, CronJobs, and Horizontal Pod Autoscaling, providing practical commands and examples for real‑world deployments.

Cloud NativeIngressKubernetes

0 likes · 29 min read

Master Kubernetes: From Core Concepts to Advanced Deployments and Autoscaling

Ops Development & AI Practice

Mar 7, 2025 · Cloud Native

Mastering Kubernetes Vertical Pod Autoscaling: How VPA Optimizes Resources

This article explains the fundamentals, components, workflow, configuration, best practices, and comparison with HPA for Kubernetes Vertical Pod Autoscaler (VPA), helping readers efficiently tune pod resources and improve cluster utilization.

Cloud NativeKubernetesVPA

0 likes · 10 min read

Mastering Kubernetes Vertical Pod Autoscaling: How VPA Optimizes Resources

Linux Ops Smart Journey

Oct 11, 2024 · Cloud Native

Master Kubernetes HPA: Auto-Scale Pods Efficiently with Real-World Examples

This guide explains what Kubernetes Horizontal Pod Autoscaler (HPA) is, how it works, its key features, and provides step‑by‑step configuration, verification, and scaling policy details with practical code examples for cloud‑native applications.

HPAKubernetesautoscaling

0 likes · 10 min read

Master Kubernetes HPA: Auto-Scale Pods Efficiently with Real-World Examples

Efficient Ops

Oct 9, 2024 · Cloud Computing

How One Engineer Runs a Full SaaS on Kubernetes with Minimal Effort

This article details how a solo engineer built and operated a SaaS platform on AWS using Kubernetes, covering infrastructure overview, automatic DNS, TLS, load balancing, CI/CD rollouts, autoscaling, caching, secret management, monitoring, logging, error tracking, and cost‑effective operations.

AWSCI/CDKubernetes

0 likes · 21 min read

How One Engineer Runs a Full SaaS on Kubernetes with Minimal Effort

Alibaba Cloud Infrastructure

Sep 5, 2024 · Artificial Intelligence

Deploying NVIDIA NIM on Alibaba Cloud ACK with Cloud‑Native AI Suite: A Step‑by‑Step Guide

This guide explains how to quickly build a high‑performance, observable, and elastically scalable LLM inference service by deploying NVIDIA NIM on an Alibaba Cloud ACK cluster using the Cloud‑Native AI Suite, KServe, Prometheus, Grafana, and custom autoscaling based on request‑queue metrics.

Alibaba Cloud ACKGrafanaKServe

0 likes · 15 min read

Deploying NVIDIA NIM on Alibaba Cloud ACK with Cloud‑Native AI Suite: A Step‑by‑Step Guide

Alibaba Cloud Native

Sep 4, 2024 · Cloud Native

Deploy NVIDIA NIM LLM Inference on Alibaba Cloud ACK with Auto‑Scaling and Monitoring

This guide walks you through deploying NVIDIA NIM for LLM inference on Alibaba Cloud ACK, integrating the Cloud Native AI Suite, configuring KServe, setting up Prometheus and Grafana monitoring, and implementing custom autoscaling based on request queue metrics.

ACKGrafanaKServe

0 likes · 15 min read

Deploy NVIDIA NIM LLM Inference on Alibaba Cloud ACK with Auto‑Scaling and Monitoring

Liangxu Linux

Jul 28, 2024 · Cloud Native

Avoid These 10 Common Kubernetes Mistakes to Boost Reliability

This article outlines the most frequent Kubernetes pitfalls—such as missing resource requests, omitted health checks, using the :latest tag, over‑privileged containers, insufficient monitoring, default namespace misuse, weak security settings, absent PodDisruptionBudgets, lack of pod anti‑affinity, and improper load‑balancing—and provides concrete commands, YAML examples, and best‑practice recommendations to prevent them.

KubernetesResource Managementautoscaling

0 likes · 13 min read

Avoid These 10 Common Kubernetes Mistakes to Boost Reliability

dbaplus Community

Jul 2, 2024 · Cloud Native

How Xiaohongshu Cut Kafka Storage Costs by 60% with a Cloud‑Native Tiered Architecture

Facing exploding Kafka scale, Xiaohongshu’s data‑storage team adopted a cloud‑native design that introduces tiered hot‑cold storage, containerization, and a custom load‑balancing service, achieving dramatic storage‑cost reductions, minute‑level cluster migrations, high‑performance data access, and automated resource scheduling.

Tiered Storageautoscalingcloud-native

0 likes · 20 min read

How Xiaohongshu Cut Kafka Storage Costs by 60% with a Cloud‑Native Tiered Architecture

Alibaba Cloud Infrastructure

May 31, 2024 · Cloud Native

Best Practices for Deploying AI Model Inference on Knative

This guide explains how to efficiently deploy AI model inference services on Knative by externalizing model data, using Fluid for accelerated loading, configuring secrets, ImageCache, graceful shutdown, probes, autoscaling parameters, mixed ECS/ECI resources, shared GPU scheduling, and observability features to achieve fast scaling, low cost, and high elasticity.

AI Model InferenceCloud NativeGPU

0 likes · 19 min read

Best Practices for Deploying AI Model Inference on Knative

DeWu Technology

Dec 27, 2023 · Cloud Native

DeWu's Cloud-Native Container Management Practices

Since August 2021, DeWu App has built a cloud‑native, multi‑cluster Kubernetes platform that uses an OAM‑style CloneSet model, Helm‑generated resources, Karmada‑based federation, custom scheduler plugins for reservation and node‑balance, offline mixing for Flink, a unified KubeAutoScaler, and a self‑built KubeAI stack, achieving significant cost cuts and improved stability while planning further middleware containerization and multi‑cloud expansion.

AIKubernetesMulti-Cluster

0 likes · 22 min read

DeWu's Cloud-Native Container Management Practices

Alibaba Cloud Native

Dec 13, 2023 · Cloud Native

Mastering Traffic Management in Knative: Blue‑Green Deployments, Autoscaling, and Monitoring

This article explains how Knative leverages request‑driven traffic management to simplify blue‑green releases, configure multi‑gateway ingress, apply revision garbage‑collection policies, enable custom domains, support multiple protocols, and provide automatic scaling and observability through Prometheus and Grafana.

Blue-Green DeploymentKnativeTraffic Management

0 likes · 15 min read

Mastering Traffic Management in Knative: Blue‑Green Deployments, Autoscaling, and Monitoring

Alibaba Cloud Native

Sep 3, 2023 · Cloud Native

Master Knative’s Request‑Based Autoscaling: KPA, Scale‑to‑Zero, and Advanced Strategies

This article explains how Knative implements request‑based autoscaling with KPA, details the scale‑to‑zero mechanism, shows how to handle burst traffic using stable and panic windows, and demonstrates advanced extensions such as resource pools, precise MPA scaling, and predictive AHPA configurations with concrete YAML examples.

Cloud NativeKPAKnative

0 likes · 18 min read

Master Knative’s Request‑Based Autoscaling: KPA, Scale‑to‑Zero, and Advanced Strategies

MaGe Linux Operations

Aug 31, 2023 · Cloud Native

How to Achieve Zero‑Downtime Deployments with Kubernetes

Learn how to configure Kubernetes for zero‑downtime applications by syncing container images, ensuring multiple pod replicas, using PodDisruptionBudgets, selecting appropriate deployment strategies, setting up liveness/readiness probes, handling graceful termination, applying pod anti‑affinity, and enabling autoscaling and proper resource limits.

KubernetesProbesZero Downtime

0 likes · 12 min read

How to Achieve Zero‑Downtime Deployments with Kubernetes

DevOps Cloud Academy

Aug 29, 2023 · Cloud Native

Achieving Zero‑Downtime Applications with Kubernetes

This article explains why and how to use Kubernetes features such as multiple pod replicas, PodDisruptionBudgets, deployment strategies, health probes, graceful termination, anti‑affinity, resource limits, and autoscaling to build zero‑downtime, highly available applications.

Deployment StrategiesHealth ProbesKubernetes

0 likes · 12 min read

Achieving Zero‑Downtime Applications with Kubernetes

Alibaba Cloud Big Data AI Platform

Aug 29, 2023 · Cloud Computing

MagicScaler: Achieving High QoS and Low Cost with Uncertainty‑Aware Autoscaling

The MagicScaler framework, introduced by Alibaba Cloud’s big‑data engineering team and collaborators, combines a multi‑scale attention Gaussian process predictor with an uncertainty‑aware elastic scaling decision engine, delivering significantly higher quality‑of‑service and lower operational costs than traditional autoscaling methods, as demonstrated on real MaxCompute workloads.

Gaussian ProcessPredictive ModelingResource Management

0 likes · 7 min read

MagicScaler: Achieving High QoS and Low Cost with Uncertainty‑Aware Autoscaling

Alibaba Cloud Big Data AI Platform

Aug 28, 2023 · Cloud Computing

How MagicScaler Achieves Cost‑Effective, High‑QoS Autoscaling with Uncertainty‑Aware Predictions

A VLDB 2023 Industrial Track paper by Alibaba Cloud introduces MagicScaler, a predictive autoscaling framework that combines multi‑scale attention Gaussian processes with uncertainty‑aware optimization to deliver high QoS at reduced cost for cloud resources.

Cloud ResourcesGaussian ProcessPredictive Modeling

0 likes · 4 min read

How MagicScaler Achieves Cost‑Effective, High‑QoS Autoscaling with Uncertainty‑Aware Predictions

AntTech

Jul 14, 2023 · Cloud Native

KapacityStack: Open‑Source Cloud‑Native Intelligent Capacity Management and IHPA

KapacityStack is an open‑source, cloud‑native capacity platform from Ant Group that introduces the Intelligent Horizontal Pod Autoscaler (IHPA) to provide predictive, multi‑level, and stable autoscaling, reducing resource waste, carbon emissions, and operational costs while supporting extensible, modular integration with Kubernetes workloads.

autoscalingcapacity managementcloud-native

0 likes · 11 min read

KapacityStack: Open‑Source Cloud‑Native Intelligent Capacity Management and IHPA

Java Architect Essentials

Jun 13, 2023 · Cloud Native

Zero‑Downtime Deployment with K8s and SpringBoot: Health Checks, Rolling Updates, Graceful Shutdown, Autoscaling, Prometheus Integration, and Config Separation

This article demonstrates how to achieve zero‑downtime releases for SpringBoot applications on Kubernetes by configuring readiness/liveness probes, rolling update strategies, graceful shutdown hooks, horizontal pod autoscaling, Prometheus monitoring, and externalized configuration via ConfigMaps.

ConfigMapHealthcheckKubernetes

0 likes · 13 min read

Zero‑Downtime Deployment with K8s and SpringBoot: Health Checks, Rolling Updates, Graceful Shutdown, Autoscaling, Prometheus Integration, and Config Separation

Full-Stack DevOps & Kubernetes

May 28, 2023 · Cloud Native

Master Kubernetes: Proven Best‑Practice Patterns for Scaling, Service Discovery, and Self‑Healing

This article presents practical Kubernetes best‑practice cases—including horizontal autoscaling, service discovery with load‑balancing, and health‑check probes—to help engineers build more reliable, scalable, and self‑healing container deployments.

Cloud NativeKubernetesautoscaling

0 likes · 8 min read

Master Kubernetes: Proven Best‑Practice Patterns for Scaling, Service Discovery, and Self‑Healing

Programmer DD

May 23, 2023 · Cloud Native

Achieve Zero‑Downtime Deployments with K8s and Spring Boot: Health Checks, Rolling Updates, and Autoscaling

This guide explains how to combine Kubernetes and Spring Boot to implement zero‑downtime releases by configuring readiness and liveness probes, defining graceful shutdown, applying rolling update strategies, setting up horizontal pod autoscaling, integrating Prometheus monitoring, and separating configuration via ConfigMaps for reusable images.

PrometheusSpring BootZero Downtime

0 likes · 13 min read

Achieve Zero‑Downtime Deployments with K8s and Spring Boot: Health Checks, Rolling Updates, and Autoscaling

Open Source Linux

May 11, 2023 · Cloud Native

When Kubernetes CPU Limits Fail: Better Alternatives and Best Practices

This article explains how Kubernetes CPU requests and limits work, why limits can throttle performance, compares language‑specific behaviors, and presents alternative strategies such as relying on requests with Horizontal Pod Autoscaling for more efficient and cost‑effective scaling.

Container OrchestrationKubernetesautoscaling

0 likes · 12 min read

When Kubernetes CPU Limits Fail: Better Alternatives and Best Practices

MaGe Linux Operations

Apr 30, 2023 · Cloud Native

Master Kubernetes Essentials: kube-proxy, Pods, Deployments, Services & More

This article explains key Kubernetes concepts—including the kube-proxy IPVS mode versus iptables, static Pods, Pod lifecycle states, Pod creation workflow, restart policies, health probes, scheduling strategies, init containers, Deployment upgrade processes and strategies, DaemonSet characteristics, Horizontal Pod Autoscaling, Service types and load‑balancing, headless Services, external access methods, Ingress routing, image pull policies, and load‑balancer options—providing a comprehensive overview for cloud‑native practitioners.

Cloud NativeDeploymentIngress

0 likes · 16 min read

Master Kubernetes Essentials: kube-proxy, Pods, Deployments, Services & More

政采云技术

Mar 2, 2023 · Cloud Computing

Kubernetes Horizontal Pod Autoscaler (HPA) and KEDA: Principles, Limitations, and Implementation

This article explores Kubernetes horizontal pod autoscaling mechanisms, comparing HPA and KEDA, their implementation principles, limitations, and practical deployment scenarios for cloud-native applications.

Cloud ComputingCloud NativeContainer Orchestration

0 likes · 14 min read

Kubernetes Horizontal Pod Autoscaler (HPA) and KEDA: Principles, Limitations, and Implementation

Cloud Native Technology Community

Feb 7, 2023 · Cloud Native

Machine Learning‑Based Optimization of Kubernetes Resources

This article explains how machine learning can be applied to automatically optimize CPU and memory settings in Kubernetes clusters, covering both experiment‑driven and observation‑driven approaches, step‑by‑step procedures, best‑practice recommendations, and the benefits of combining both methods for efficient, scalable cloud‑native operations.

KubernetesPerformanceautoscaling

0 likes · 11 min read

Machine Learning‑Based Optimization of Kubernetes Resources

ITPUB

Dec 26, 2022 · Cloud Native

What Really Happens When You Deploy an App on Kubernetes?

This article walks through the complete lifecycle of a Kubernetes deployment, explaining how a manual upgrade request triggers API calls, creates Deployments, ReplicaSets, Pods, and how the scheduler, kubelet, and Docker work together, while also covering concepts like containers, labels, replication controllers, deployments, and autoscaling mechanisms.

Cloud NativeContainersDeployment

0 likes · 23 min read

What Really Happens When You Deploy an App on Kubernetes?

HelloTech

Dec 23, 2022 · Cloud Native

Design Principles and Implementation Details of Kubernetes Horizontal Pod Autoscaler and Custom Water Pod Autoscaler

The article explains Kubernetes’ built‑in Horizontal Pod Autoscaler, then details the custom Water Pod Autoscaler (WPA) that extends HPA with dual‑signal (load and SOA registration) detection, dual‑threshold scaling, noise filtering, configurable cooldown, frequency limits, tolerance buffers, and integrated alerting for reliable elastic scaling.

Cloud NativeHPAKubernetes

0 likes · 13 min read

Design Principles and Implementation Details of Kubernetes Horizontal Pod Autoscaler and Custom Water Pod Autoscaler

Tencent Cloud Developer

Nov 24, 2022 · Cloud Native

Large‑Scale Cost Optimization for Kubernetes/TKE: Data Collection, Measures, and Implementation

The article details a Tencent‑led, end‑to‑end cost‑optimization project for large‑scale Kubernetes/TKE clusters that collected extensive workload metrics, applied VPA/HPA enhancements, custom scheduling and node‑downscaling via the open‑source Crane platform, ultimately delivering up to 70% CPU and 50% memory savings with zero‑fault deployments.

HPAKubernetesResource Management

0 likes · 29 min read

Large‑Scale Cost Optimization for Kubernetes/TKE: Data Collection, Measures, and Implementation

AntTech

Nov 10, 2022 · Cloud Computing

DeepScaling: An Automated Capacity Evaluation System for Stable CPU Utilization in Large‑Scale Cloud Services

DeepScaling is a deep‑learning‑driven autoscaling framework that predicts workload, estimates CPU usage, and makes reinforcement‑learning‑based scaling decisions to keep microservice CPU utilization at a target level, thereby reducing resource waste while meeting SLOs in large‑scale cloud environments.

Cloud ComputingResource Managementautoscaling

0 likes · 21 min read

DeepScaling: An Automated Capacity Evaluation System for Stable CPU Utilization in Large‑Scale Cloud Services

Efficient Ops

Nov 2, 2022 · Cloud Native

Why Your HPA Isn’t Scaling: 3 Common Misconceptions and How to Fix Them

This article explains three frequent misunderstandings about Kubernetes Horizontal Pod Autoscaler—dead zones, misuse of utilization calculations, and perceived lag in scaling—while detailing HPA’s inner workings, metric sources, calculation methods, and behavior configuration to help you avoid scaling pitfalls.

HPAKubernetesautoscaling

0 likes · 12 min read

Why Your HPA Isn’t Scaling: 3 Common Misconceptions and How to Fix Them

Huolala Tech

Oct 20, 2022 · Cloud Native

How Huolala Cuts Cloud Costs with Kubernetes: Spot Instances, Smart Autoscaling, and Predictive Scaling

This presentation details Huolala's end‑to‑end cloud‑native cost‑optimization strategy, covering the company's infrastructure basics, Kubernetes‑based server cost‑saving techniques, a tailored optimization roadmap, practical Spot Instance usage, and a custom CronHPA‑driven scheduled scaling solution to boost resource utilization.

Cloud NativeHPAKubernetes

0 likes · 23 min read

How Huolala Cuts Cloud Costs with Kubernetes: Spot Instances, Smart Autoscaling, and Predictive Scaling

Alibaba Cloud Native

Oct 3, 2022 · Cloud Native

Can AHPA Predict Kubernetes Scaling Before Load Spikes?

This article introduces the Advanced Horizontal Pod Autoscaler (AHPA), explains its three‑stage architecture of data collection, prediction, and scaling, details the RobustScaler forecasting algorithm and CRD‑based deployment, and evaluates its ability to proactively and reactively adjust pod counts with high robustness.

CRDCloud NativeKubernetes

0 likes · 13 min read

Can AHPA Predict Kubernetes Scaling Before Load Spikes?

Practical DevOps Architecture

Sep 20, 2022 · Cloud Native

Kubernetes Advantages, Use Cases, Features, Drawbacks, and Core Concepts

This article outlines Kubernetes' main advantages such as container orchestration, lightweight design, open‑source nature, elastic scaling and load balancing, describes typical deployment scenarios, highlights its portability, extensibility and automation, lists current drawbacks, and explains fundamental components like master, node, pod, labels, controllers, services, volumes, and namespaces.

autoscaling

0 likes · 5 min read

Kubernetes Advantages, Use Cases, Features, Drawbacks, and Core Concepts

Xingsheng Youxuan Technology Community

Aug 18, 2022 · Cloud Native

Unlocking 800% Node Overselling: Xingdou Cloud’s Smart Resource Strategies

This article details how Xingdou Cloud leverages cloud‑native techniques such as massive node overselling, custom HPA (SophonHPA), priority‑based QoS, intelligent cleanup, and quota management to achieve dramatic cost reduction and efficiency gains across its multi‑cloud platform.

Resource Managementautoscalingcloud-native

0 likes · 18 min read

Unlocking 800% Node Overselling: Xingdou Cloud’s Smart Resource Strategies

AntTech

Jun 22, 2022 · Cloud Computing

Meta Reinforcement Learning Framework for Predictive Autoscaling in Cloud Environments

This article presents a cloud-native, end‑to‑end autoscaling solution that integrates traffic forecasting, CPU utilization meta‑prediction, and a reinforcement‑learning‑based scaling decision module into a fully differentiable system, achieving higher resource utilization and cost efficiency as demonstrated by ACM SIGKDD 2022 research.

Cloud ComputingMeta LearningPredictive Modeling

0 likes · 10 min read

Meta Reinforcement Learning Framework for Predictive Autoscaling in Cloud Environments

DataFunTalk

May 21, 2022 · Big Data

Exploring and Implementing Elastic Scheduling for Xiaomi Hadoop YARN

This talk presents Xiaomi's design and deployment of an elastic scheduling system for Hadoop YARN, covering background analysis, resource‑pool strategy, auto‑scaling architecture, stability challenges, label‑based resource isolation, Spark shuffle handling, cost‑saving results and future plans.

Big DataHadoopResource Management

0 likes · 16 min read

Exploring and Implementing Elastic Scheduling for Xiaomi Hadoop YARN

Alibaba Cloud Native

May 5, 2022 · Cloud Native

Achieving Low‑Cost, High‑Elastic Kubernetes Deployments with ACK, ECI, and OpenKruise

This article explains how to use Kubernetes native autoscaling components—HPA, VPA, Cluster Autoscaler—and cloud‑native extensions such as Alibaba Cloud's Virtual Node, Elastic Container Instance, Elastic Workload, and the open‑source OpenKruise to build a cost‑effective, highly elastic architecture on ACK clusters.

Cluster AutoscalerElastic WorkloadHPA

0 likes · 28 min read

Achieving Low‑Cost, High‑Elastic Kubernetes Deployments with ACK, ECI, and OpenKruise

dbaplus Community

Apr 6, 2022 · Cloud Native

How Huolala Cuts Cloud Costs: Real‑World Kubernetes Optimization Strategies

Huolala’s architecture team shares a detailed walkthrough of their cloud‑native cost‑optimization journey, covering public‑cloud server pricing models, Kubernetes request/limit tuning, HPA and CronHPA scheduling, spot instance integration, intelligent pod placement, and practical lessons learned from scaling a global on‑demand logistics platform.

Cloud NativeHuolalaKubernetes

0 likes · 25 min read

How Huolala Cuts Cloud Costs: Real‑World Kubernetes Optimization Strategies

Alibaba Cloud Developer

Mar 17, 2022 · Big Data

How AutoStream Scales Real‑Time Data Processing with Flink, Iceberg, and PyFlink

This article details AutoStream's evolution from a Java‑only Storm platform to a Flink‑based, Kubernetes‑native streaming system that integrates budgeting controls, automatic scaling, lakehouse architecture with Iceberg, and PyFlink support, highlighting the technical challenges, solutions, and future roadmap for real‑time analytics.

FlinkIcebergLakehouse

0 likes · 23 min read

How AutoStream Scales Real‑Time Data Processing with Flink, Iceberg, and PyFlink

HomeTech

Mar 16, 2022 · Cloud Native

Understanding Kubernetes Horizontal Pod Autoscaler (HPA): Mechanism, Core Source Code, and Practical Insights

This article explains how Kubernetes Horizontal Pod Autoscaler (HPA) balances resource demand and workload by automatically scaling pod replicas, describes the different metric types it supports, walks through the core controller code (Run, worker, reconcile, and replica calculation), highlights current limitations, and shares practical observations from real‑world usage.

Horizontal Pod AutoscalerKubernetesautoscaling

0 likes · 11 min read

Understanding Kubernetes Horizontal Pod Autoscaler (HPA): Mechanism, Core Source Code, and Practical Insights

Ctrip Technology

Dec 30, 2021 · Cloud Computing

Ctrip’s Practice of Using AWS Spot Instances for Cost Reduction and High Availability

This article details Ctrip’s large‑scale use of AWS Spot instances on Kubernetes, explaining the cost benefits, the challenges of spot interruptions, and the architectural and operational strategies—including multi‑AZ deployment, scheduling policies, autoscaling group design, and observability—that enable a 50% reduction in container costs while maintaining system stability and reliability.

AWS SpotCloud ComputingKubernetes

0 likes · 13 min read

Ctrip’s Practice of Using AWS Spot Instances for Cost Reduction and High Availability

Node Underground

Aug 21, 2021 · Cloud Native

Mastering Autoscaling: HPA, VPA, and KNative KPA in Cloud‑Native Environments

This article reviews the current state of Kubernetes horizontal and vertical autoscaling, compares HPA, VPA, and KNative KPA, discusses their limitations, and proposes short‑ and long‑term ideas for a more dynamic, low‑ops scheduling system.

Cloud NativeHPAKnative

0 likes · 6 min read

Mastering Autoscaling: HPA, VPA, and KNative KPA in Cloud‑Native Environments

dbaplus Community

Jul 19, 2021 · Cloud Native

Avoid These 10 Common Kubernetes Mistakes to Boost Reliability and Cost Efficiency

This article shares a practical guide to the most frequent Kubernetes pitfalls—from misconfigured resource requests and limits to improper liveness/readiness probes, load‑balancer settings, IAM misuse, pod anti‑affinity, and disruption budgets—offering concrete YAML examples and remediation steps to help operators run more reliable and cost‑effective clusters.

Cloud NativeKubernetesProbes

0 likes · 18 min read

Avoid These 10 Common Kubernetes Mistakes to Boost Reliability and Cost Efficiency

MaGe Linux Operations

Jul 11, 2021 · Cloud Computing

Master Kubernetes Autoscaling: HPA, VPA, and Cluster Autoscaler for Cost Savings

This article explains how Kubernetes' built‑in autoscaling mechanisms—Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and Cluster Autoscaler—work, when to use each, and best‑practice tips to reduce cloud costs while maintaining application performance.

Cluster AutoscalerHPAKubernetes

0 likes · 9 min read

Master Kubernetes Autoscaling: HPA, VPA, and Cluster Autoscaler for Cost Savings

Tencent Cloud Developer

Jun 21, 2021 · Industry Insights

How Hadoop YARN on Kubernetes Pods Supercharge Resource Utilization and Cut Costs

This article explains how Tencent Cloud EMR integrated Hadoop YARN with Kubernetes Pods to create a hybrid online‑offline deployment, implement elastic autoscaling and multi‑label resource allocation, and achieve several‑hundred‑percent improvements in CPU utilization while preserving cluster stability.

Big DataCloud NativeHadoop

0 likes · 11 min read

How Hadoop YARN on Kubernetes Pods Supercharge Resource Utilization and Cut Costs

Alibaba Cloud Native

May 12, 2021 · Cloud Native

Simplify Kubernetes Autoscaling with KEDA and Alibaba Cloud EDAS

This article explains how to scale Kubernetes clusters and applications using tools like Cluster Autoscaler, Virtual Kubelet, HPA, VPA, and especially KEDA, while showcasing Alibaba Cloud EDAS's integration and future AI‑driven autoscaling enhancements.

Cloud NativeEDASKEDA

0 likes · 11 min read

Simplify Kubernetes Autoscaling with KEDA and Alibaba Cloud EDAS

ITFLY8 Architecture Home

Apr 17, 2021 · Cloud Native

How Knative Handles Cold‑Start Traffic: From Activator to Pod

This article explores Knative’s traffic routing and autoscaling mechanisms, detailing how requests are initially directed through the Activator during cold‑start, how VirtualService configurations evolve, and how newer versions shift traffic handling to Kubernetes Service/Endpoint layers, improving performance and decoupling gateway logic.

IstioKnativeKubernetes

0 likes · 14 min read

How Knative Handles Cold‑Start Traffic: From Activator to Pod

Alibaba Cloud Native

Apr 5, 2021 · Cloud Native

How Knative Enables Traffic‑Based Autoscaling and Gray Deployments

This article explains Knative’s traffic‑driven autoscaling and gray‑release capabilities, detailing the request flow architecture, the roles of Service, Configuration, Route and Revision, and walks through built‑in scaling strategies such as KPA, HPA, scheduled‑HPA, event‑gateway and custom plugins, with practical examples.

Cloud NativeGray DeploymentHPA

0 likes · 10 min read

How Knative Enables Traffic‑Based Autoscaling and Gray Deployments

Liulishuo Tech Team

Feb 4, 2021 · Cloud Computing

Improving Cloud Cost Allocation and Resource Utilization through Catalog, Tags, and Automated Monitoring

This article describes how a tech team built a catalog‑based cost‑allocation system, leveraged cloud tags and Kubernetes labels, used Prometheus data for scaling decisions, and combined reserved, spot, and on‑demand instances to boost cloud resource utilization while keeping services stable.

Cloud Costautoscalingcloud-tagging

0 likes · 8 min read

Improving Cloud Cost Allocation and Resource Utilization through Catalog, Tags, and Automated Monitoring

Open Source Linux

Jan 29, 2021 · Operations

Essential Kubernetes Production Best Practices for Secure, Scalable Ops

This article outlines comprehensive production‑grade Kubernetes best practices—including health probes, RBAC, resource management, network policies, monitoring, autoscaling, image security, and zero‑downtime strategies—to help teams run secure, efficient, and highly available workloads.

KubernetesOperationsProduction

0 likes · 11 min read

Essential Kubernetes Production Best Practices for Secure, Scalable Ops

MaGe Linux Operations

Oct 8, 2020 · Cloud Native

How to Auto‑Scale Nginx on Kubernetes Using Prometheus Adapter and Custom Metrics

This guide walks through deploying an Nginx sample app on Kubernetes, exposing Prometheus‑collected custom metrics, configuring a Prometheus adapter, and creating a Horizontal Pod Autoscaler that scales the deployment based on request‑per‑second metrics.

Horizontal Pod AutoscalerKubernetesPrometheus

0 likes · 16 min read

How to Auto‑Scale Nginx on Kubernetes Using Prometheus Adapter and Custom Metrics

Java Architect Essentials

Aug 12, 2020 · Operations

Common Kubernetes Pitfalls and How to Fix Them

This article outlines frequent Kubernetes operational mistakes—such as misconfigured resource requests, missing probes, improper load‑balancer exposure, naïve autoscaling, IAM/RBAC misuse, lack of anti‑affinity, absent PodDisruptionBudgets, multi‑tenant pitfalls, and suboptimal externalTrafficPolicy—providing concrete remediation steps and best‑practice code examples.

KubernetesProbesResource Management

0 likes · 15 min read

Common Kubernetes Pitfalls and How to Fix Them

Efficient Ops

Jun 25, 2020 · Cloud Native

How Xiaomi Scaled Redis with Kubernetes: Deploying Redis Cluster on K8s

This article explains how Xiaomi migrated tens of thousands of Redis instances from bare‑metal servers to Kubernetes, using Redis Proxy, StatefulSets, and Ceph storage to achieve resource isolation, automated deployment, dynamic scaling, and improved reliability while addressing latency, IP‑change, and security challenges.

CephKubernetesRedis

0 likes · 20 min read

How Xiaomi Scaled Redis with Kubernetes: Deploying Redis Cluster on K8s

Bitu Technology

Jun 5, 2020 · Cloud Native

Building Tubi Data Runtime on JupyterHub: Architecture, Authentication, Storage, GPU Support, and Autoscaling

This article details how Tubi built the Tubi Data Runtime platform on JupyterHub using Kubernetes, covering authentication with Okta SSO, custom Docker images, shared EFS storage, multi‑service support, GPU enablement, node affinity, cluster autoscaling, and monitoring with Prometheus.

AWSCloud NativeDocker

0 likes · 17 min read

Building Tubi Data Runtime on JupyterHub: Architecture, Authentication, Storage, GPU Support, and Autoscaling

Node Underground

Feb 23, 2020 · Cloud Native

Mastering Serverless Scaling: Docker, Kubernetes & Elastic Autoscaling Explained

This article introduces the fundamentals of serverless architecture, explains how Docker containers and Kubernetes orchestration enable dynamic scaling, and outlines various autoscaling mechanisms and scenarios for elastic resource management in modern cloud-native applications.

Cloud NativeDockerKubernetes

0 likes · 6 min read

Mastering Serverless Scaling: Docker, Kubernetes & Elastic Autoscaling Explained

Tencent Cloud Developer

Sep 12, 2019 · Cloud Native

Optimizing Kubernetes Cluster Load: From Static Scheduling to Advanced Resource Management

The article explains Kubernetes’ static scheduler causes fragmented, under‑utilized clusters, then proposes dynamic techniques—pod resource compression, node resource oversell via admission webhooks, and an enhanced per‑HPA autoscaling controller—while outlining future scheduler extensions, monitoring integration with Tencent Cloud, and a senior cloud‑native engineer recruitment call.

KubernetesResource CompressionStatic Scheduling

0 likes · 12 min read

Optimizing Kubernetes Cluster Load: From Static Scheduling to Advanced Resource Management

Alibaba Cloud Native

Sep 4, 2019 · Cloud Native

How Serverless and Autoscaling Transform Kubernetes: Principles, Challenges, and Solutions

This article explains how serverless and autoscaling complement Kubernetes by detailing resource‑capacity curves, stakeholder needs, core autoscaling components, key challenges, design philosophy, classic use cases, limitations of traditional scaling, and the emerging virtual‑kubelet‑autoscaler solution.

Cluster AutoscalerKubernetesServerless

0 likes · 15 min read

How Serverless and Autoscaling Transform Kubernetes: Principles, Challenges, and Solutions

NetEase Game Operations Platform

Jul 19, 2019 · Cloud Computing

Dynamic Scaling Practices for NetEase Game Operations on AWS

The article details NetEase's experience designing and implementing dynamic server scaling for overseas games on AWS, comparing Auto Scaling and GameLift, describing a custom scaling platform, the challenges faced, lessons learned, and future directions for cloud‑based game operations.

AWSCloud ComputingDynamic Scaling

0 likes · 19 min read

Dynamic Scaling Practices for NetEase Game Operations on AWS

Alibaba Cloud Native

Jul 1, 2019 · Cloud Native

How Alibaba Cloud’s Kubernetes Service Enables Seamless Monitoring and Autoscaling

Alibaba Cloud’s Kubernetes service integrates four native monitoring services—SLS, ARMS, AHAS, and Cloud Monitor—while offering enhanced open‑source components and autoscaling mechanisms such as HPA, VPA, cronHPA, Resizer, Cluster‑Autoscaler, and virtual‑kubelet‑autoscaler, enabling cloud‑native apps to achieve robust observability and elastic scaling.

Alibaba CloudKubernetesautoscaling

0 likes · 10 min read

How Alibaba Cloud’s Kubernetes Service Enables Seamless Monitoring and Autoscaling

Meitu Technology

Jan 30, 2019 · Cloud Native

Meitu's Container Platform: Architecture, Network, Load Balancing, Logging, Scheduling, and Autoscaling

Meitu’s container platform, built on Kubernetes with Calico networking, a custom Nginx load‑balancer, unified logging, refined scheduling, autoscaling, and comprehensive monitoring, enables seamless multi‑cluster hybrid‑cloud operations for its hundreds‑of‑millions‑user services while providing CI/CD tooling and future‑ready extensions such as service mesh and edge computing.

Cloud NativeKubernetesLogging

0 likes · 23 min read

Meitu's Container Platform: Architecture, Network, Load Balancing, Logging, Scheduling, and Autoscaling

Architecture Digest

Feb 2, 2018 · Cloud Computing

Design and Implementation of an Elastic Scaling Service on Alibaba ECS

This article explains why elastic scaling is needed for variable web traffic, describes how to build a cost‑effective, automatically adjustable service on Alibaba ECS using message queues, service refactoring, Docker deployment, logging, and a real‑time allocation algorithm, and shares practical lessons learned.

Alibaba ECSAllocation AlgorithmCloud Computing

0 likes · 9 min read

Design and Implementation of an Elastic Scaling Service on Alibaba ECS

Liulishuo Tech Team

Dec 31, 2016 · Cloud Native

Designing Scalable and Reliable Backend Services at English Fluently: Architecture, Service Discovery, Monitoring, and Autoscaling

This article shares the engineering team’s experience of building a high‑growth, reliable backend for English Fluently, covering inter‑service communication with gRPC, service discovery, Docker‑based deployment, health‑checking, monitoring, autoscaling, Kubernetes orchestration, and multi‑cell availability strategies.

DockerKubernetesMicroservices

0 likes · 10 min read

Designing Scalable and Reliable Backend Services at English Fluently: Architecture, Service Discovery, Monitoring, and Autoscaling