Artificial Intelligence 9 min read

Kapacity V0.2 Release: AI‑Driven Traffic‑Based Replica Prediction for Cloud‑Native Autoscaling

Kapacity V0.2 introduces an AI‑powered, traffic‑driven replica prediction algorithm for cloud‑native autoscaling, featuring a Linear‑Residual model, a lightweight Swish Net time‑series forecaster, custom metric support, and open‑source tools, aiming to improve resource efficiency and reduce operational risk.

AntTech
AntTech
AntTech
Kapacity V0.2 Release: AI‑Driven Traffic‑Based Replica Prediction for Cloud‑Native Autoscaling

Kapacity, built on Ant Group's large‑scale production experience, offers an open‑source cloud‑native capacity solution that combines intelligent risk mitigation with cost reduction. The V0.2 milestone adds a traffic‑driven replica‑count prediction AI algorithm, enabling production‑grade predictive autoscaling.

Background

The project aims to solve cloud‑native capacity challenges through intelligent methods and to maximize open‑source compatibility, allowing seamless reuse of existing Prometheus Adapter configurations for custom metrics.

Key Features of the New Version

Traffic‑driven replica prediction algorithm that reacts to traffic changes before resource metrics shift.

Predictive autoscaling offers earlier response, more stable resource levels, and higher precision.

Linear‑Residual Model: a linear model learns the relationship among traffic, resource utilization, and replica count, while a residual model (LightGBM) captures non‑linear effects and time‑based anomalies.

Swish Net for Time Series Forecasting: a lightweight deep‑learning model (<1 MiB) that predicts traffic 12 steps ahead (2 hours) with training time of ~1 minute per epoch on a CPU.

The table below shows the model’s performance on a production traffic dataset compared with DeepAR and N‑BEATS.

Model

MAE

RMSE

DeepAR

1.734

31.315

N‑BEATS

1.851

41.681

Kapacity (ours)

1.597

28.732

Custom Metric Support

Kapacity extends the Kubernetes Metrics API to provide historical queries and workload‑level aggregation, fully compatible with existing Prometheus Adapter configurations, enabling the same custom‑metric setup used for HPA to work with predictive autoscaling.

Future Outlook

Upcoming releases will enhance risk‑identification and self‑healing during scaling events, add normal‑state capacity risk detection, and provide a visual dashboard for easier operation.

Community Involvement

The project is open‑source (GitHub: https://github.com/traas-stack/kapacity) and welcomes contributions, issues, and discussions. Join the community via WeChat, DingTalk, or the official public account for updates and collaboration.

machine learningAIKubernetesCapacity Planningopen sourcePredictive Autoscaling
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.