Cloud Native 11 min read

KapacityStack: Open‑Source Cloud‑Native Intelligent Capacity Management and IHPA

KapacityStack is an open‑source, cloud‑native capacity platform from Ant Group that introduces the Intelligent Horizontal Pod Autoscaler (IHPA) to provide predictive, multi‑level, and stable autoscaling, reducing resource waste, carbon emissions, and operational costs while supporting extensible, modular integration with Kubernetes workloads.

AntTech
AntTech
AntTech
KapacityStack: Open‑Source Cloud‑Native Intelligent Capacity Management and IHPA

KapacityStack, built on Ant Group's large‑scale production experience, offers a comprehensive cloud‑native capacity technology that aims to improve cost efficiency, reduce carbon emissions, and address capacity challenges with robust risk management.

The project’s source code is hosted at https://github.com/traas-stack/kapacity .

In the digital economy, rapid growth in data and compute demand leads to high resource consumption and carbon emissions. Ant Group has pursued "green computing" since 2019, developing technologies such as hybrid deployment, AI‑elastic capacity, cloud‑native time‑slice scheduling, and green AI.

During the 2022 Double‑11 event, Ant Group saved 1.538 M kWh of electricity and reduced 947 t of CO₂, equivalent to the annual carbon sequestration of 79 000 trees.

Leveraging cloud‑native architecture, Ant has researched and built AI‑elastic capacity capabilities—including elastic capacity, intelligent capacity data, stability, and operations—accumulating algorithms and best‑practice risk mitigations that now save roughly 100 k cores of compute annually.

KapacityStack open‑sources this technology, providing an extensible, intelligent capacity system for the community.

Key Technical Features

The native Kubernetes Horizontal Pod Autoscaler (HPA) has limitations: reactive scaling, linear metric assumptions, lack of risk controls, and tight coupling to specific K8s versions.

Kapacity’s first core open‑source capability, the Intelligent Horizontal Pod Autoscaler (IHPA), addresses all these issues.

▌ Intelligent Elasticity

IHPA treats elasticity as a data‑driven decision process, supporting multiple algorithms (timed, reactive, predictive, burst‑type) and allowing custom strategy composition for precise scaling.

For predictive scaling, IHPA uses a machine‑learning pipeline: Swish Net for Time Series Forecasting (SNTSF) predicts influencing traffic streams, then a Linear‑Residual Model combines these forecasts with capacity metrics to recommend replica counts, handling non‑linear relationships and multi‑period traffic.

▌ Multi‑Level Elasticity

IHPA defines four pod states to enable fine‑grained control:

Online – running and ready (default for new pods).

Cutoff – running but not ready; used for rapid scaling‑down with a stability observation period.

Standby – resources swapped out, fully released, with minute‑level rollback to Online.

Deleted – pod fully removed.

Combining these states enables advanced techniques such as large‑scale time‑slice scheduling and hot‑pool management.

▌ Stability Assurance

IHPA incorporates Ant’s extensive production experience to provide stability guarantees, including gray‑scale rollout, multi‑stage gray‑scale using Cutoff/Standby, and custom stability checks with automatic circuit‑breakers for unattended elastic changes.

▌ Extensible Design

IHPA is modular, split into control, decision, and execution components, each replaceable. Extensibility includes custom algorithms, pod state logic, stability checks, and pod‑priority policies, allowing integration with other open‑source solutions.

Current Status and Future Roadmap

Version 0.1 (early stage) provides multi‑level elasticity, gray‑scale changes, and basic timed/reactive algorithms. Version 0.2 will open the predictive algorithm. Future work includes burst‑detection, enhanced stability checks, richer custom metrics, standby‑based time‑slice scheduling, intelligent resource recommendation (CPU/Memory, VPA), and a visual console for cost and carbon accounting.

For updates, see the roadmap at https://kapacity.netlify.app/zh-cn/docs/roadmap .

Join the Community

Kapacity aims to build an open, collaborative community. Contributions, issues, pull requests, and discussions are welcomed via the GitHub repository. Community groups on WeChat, DingTalk, and the official public account provide channels for further engagement.

cloud nativeKubernetesautoscalingOpen-sourcecapacity-managementintelligent autoscaler
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.