Kapacity V0.2 Release: AI‑Driven Traffic‑Based Replica Prediction for Cloud‑Native Autoscaling
Kapacity V0.2 introduces an AI‑powered, traffic‑driven replica prediction algorithm for cloud‑native autoscaling, featuring a Linear‑Residual model, a lightweight Swish Net time‑series forecaster, custom metric support, and open‑source tools, aiming to improve resource efficiency and reduce operational risk.
Kapacity, built on Ant Group's large‑scale production experience, offers an open‑source cloud‑native capacity solution that combines intelligent risk mitigation with cost reduction. The V0.2 milestone adds a traffic‑driven replica‑count prediction AI algorithm, enabling production‑grade predictive autoscaling.
Background
The project aims to solve cloud‑native capacity challenges through intelligent methods and to maximize open‑source compatibility, allowing seamless reuse of existing Prometheus Adapter configurations for custom metrics.
Key Features of the New Version
Traffic‑driven replica prediction algorithm that reacts to traffic changes before resource metrics shift.
Predictive autoscaling offers earlier response, more stable resource levels, and higher precision.
Linear‑Residual Model: a linear model learns the relationship among traffic, resource utilization, and replica count, while a residual model (LightGBM) captures non‑linear effects and time‑based anomalies.
Swish Net for Time Series Forecasting: a lightweight deep‑learning model (<1 MiB) that predicts traffic 12 steps ahead (2 hours) with training time of ~1 minute per epoch on a CPU.
The table below shows the model’s performance on a production traffic dataset compared with DeepAR and N‑BEATS.
Model
MAE
RMSE
DeepAR
1.734
31.315
N‑BEATS
1.851
41.681
Kapacity (ours)
1.597
28.732
Custom Metric Support
Kapacity extends the Kubernetes Metrics API to provide historical queries and workload‑level aggregation, fully compatible with existing Prometheus Adapter configurations, enabling the same custom‑metric setup used for HPA to work with predictive autoscaling.
Future Outlook
Upcoming releases will enhance risk‑identification and self‑healing during scaling events, add normal‑state capacity risk detection, and provide a visual dashboard for easier operation.
Community Involvement
The project is open‑source (GitHub: https://github.com/traas-stack/kapacity) and welcomes contributions, issues, and discussions. Join the community via WeChat, DingTalk, or the official public account for updates and collaboration.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.