Resource Overcommit Strategies in Vivo Container Platform: Static and Dynamic Approaches
Vivo’s container platform combats oversized resource requests by first applying static coefficient‑based overcommit at deployment and then using a dynamic recommender that continuously gathers usage metrics, builds exponential histograms with a half‑life sliding‑window model, and adjusts CPU (and optionally memory) requests, improving packing efficiency, reducing billing, and boosting CPU utilization by up to eight percent while maintaining HPA accuracy.
The Vivo container platform addresses the problem of overly large resource request values by applying two technical schemes—static overcommit and dynamic overcommit—to rationalize business resource requests, improve packing efficiency, and increase overall resource utilization.
Background
In Kubernetes, containers define requests (minimum guaranteed resources) and limits (maximum usable resources). Requests are critical for scheduling, while limits protect the node at runtime. Users often set requests either far below actual usage (causing hotspot issues) or far above actual usage (inflating billing and leaving idle resources).
Current Situation
Vivo’s platform categorizes pools into test, shared, dedicated, and mixed pools. All online workloads must specify request and limit , with request ≤ limit . In the shared pool, two typical patterns emerge:
Few cases where the request is set low but actual usage far exceeds it, leading to node hotspots.
Most cases where the request is set high (often equal to the limit) while actual usage is much lower, causing high billing and low node utilization.
Because the platform cannot directly judge the rationality of a user’s request, it cannot enforce hard limits at first deployment.
1. Static Overcommit Scheme
The static scheme reduces the user‑submitted request by a predefined coefficient that varies by cluster, data center, and environment. The caas-openapi component automatically rewrites the request values.
Advantages: Simple to implement and can be applied at the first deployment.
Disadvantages: Conservative coefficients may still leave requests oversized; memory, being non‑compressible, is not overcommitted—only CPU is affected to avoid OOM‑kill.
2. Dynamic Overcommit Scheme
A caas-recommender component continuously collects real‑time resource usage from monitoring systems (metrics server or Prometheus) and adjusts requests based on a recommendation model.
2.1 Workflow
Pull actual usage metrics for each container.
Apply an algorithmic model to compute a recommended request value.
When the workload is redeployed, replace the original request with the recommended value.
2.2 Half‑Life Sliding‑Window Model
The model assigns higher weight to newer samples and lower weight to older ones. The decay factor is calculated as 2^((timestamp‑referenceTimestamp)/halfLife) , with a default half‑life of 24 hours. This approach mirrors Google Borg Autopilot’s moving‑window model.
2.3 Exponential Histogram for Recommendation
During each scan (default 1 minute), the recommender builds an exponential histogram of resource usage weighted by the decay factor. Bucket sizes grow exponentially (e.g., bucketSize = 0.01 × 1.05^N ). The 95th percentile (P95) for CPU and 99th percentile (P99) for memory are extracted to form the final recommendation.
2.4 Recommender Component Flow
Controller watches profile CRDs and creates corresponding recommendation CRDs.
On start‑up, it either recovers a checkpointed histogram or builds a new one from Prometheus data.
In a loop, it selects workloads needing management, fetches metrics, updates the histogram, computes the percentile‑based recommendation, stores the result, and performs garbage collection.
2.5 Recommendation Adjustment Rules
Final recommendation = model output × configurable scaling factor.
If recommendation < request, use the recommendation; otherwise keep the original request.
Memory adjustments are optional; limits are never modified.
3. HPA Utilization Calculation Refactor
The native HPA uses request as the basis for utilization. After overcommit, this would break autoscaling. The platform records both request and limit in pod annotations and modifies the controller to compute utilization based on the annotated dimension (limit), thereby decoupling HPA from the overcommitted request.
4. Dedicated Pool Overcommit Support
Dedicated pools, managed by business teams, can optionally enable overcommit (static, dynamic, or both) with configurable coefficients and scaling factors. Automatic modification can be toggled, allowing teams to apply recommendations manually if desired.
5. Overall Deployment Process
First deployment uses static coefficients to adjust requests. After monitoring, dynamic recommendations are generated and applied in subsequent deployments. If no recommendation is available, the static coefficient remains in effect.
6. Effects and Benefits
Test Cluster : After four months of dynamic overcommit, 90 % of workloads were managed, average memory request dropped from 4.07 Gi to 3.1 Gi, and memory packing rate decreased by 10 %, alleviating memory pressure.
Production Shared Pool : After three months, 60 % of workloads were managed, average CPU request fell from 2.86 to 2.35, raising overall CPU utilization by ~8 %.
Conclusion & Outlook
The overcommit strategies reduce request values to realistic levels, cut costs, and improve node utilization. Future work includes extending overcommit to memory, GPU resources, and leveraging recommendation data for workload profiling and smarter scheduling.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.