Cloud Native 12 min read

How We Built Elastic Scaling and Hybrid‑Cloud Auto‑Scaling on Kubernetes

After fully containerizing their platform, the team tackled front‑line development scaling challenges by designing a custom elastic‑scaling solution that combines dual‑threshold and timed scaling, integrates hybrid‑cloud ClusterAutoScale, consolidates middleware resources, and implements a comprehensive K8s observability stack, delivering over 30% additional compute capacity and near‑perfect scaling reliability.

dbaplus Community
dbaplus Community
dbaplus Community
How We Built Elastic Scaling and Hybrid‑Cloud Auto‑Scaling on Kubernetes

Elastic Scaling Practices in a Fully Containerized Environment

Following a complete network‑wide containerization, front‑line developers encountered timing, capacity, efficiency, and cost issues, making elastic scaling an inevitable choice for cloud‑native workloads.

1. Problems Faced by Front‑Line Development After Full‑Scale Containerization

Developers struggled with when to scale, how much capacity to allocate, and how to balance efficiency against cost.

2. Issues with Native HPA

Initial attempts to use the native Horizontal Pod Autoscaler (HPA) revealed several limitations: lack of custom metrics, no support for scheduled scaling, reliance on resources.requests for utilization, and a single‑goroutine execution model. Business‑specific constraints, such as non‑interruptible job instances and downstream database availability, further complicated usage.

3. Elastic Capability Based on Real‑Time Instance Watermarks

The team built an elastic mechanism driven by actual instance watermarks and effective load.

Dual‑threshold control to constrain underlying stability for fluctuating workloads.

Ceiling‑based expansion and floor‑based contraction following a “max‑available” principle.

Data denoising: filter out non‑ready instances, strong business‑relationship instances, startup spikes, and missing metrics.

Performance boost by namespace‑level listening and concurrency control.

4. Fusion of Watermark‑Based and Scheduled Scaling

Both watermark thresholds and scheduled scaling were combined so a single application can use both mechanisms simultaneously. The rule: expand to the larger of the two targets, and never shrink below the scheduled replica count.

5. Production Impact After Applying Elastic Scaling

Post‑deployment metrics show significant improvements:

Code pre‑warming ensures readiness before traffic peaks.

Reduced periodic fluctuations for bursty applications.

Lowered application redundancy, enabling early‑stage prototypes.

Improved handling of sudden traffic spikes and overall stability.

Better alerting and controllable application state.

Overall scaling success rate exceeded 90%, scaling event delivery reached 100%, and compute capacity grew by more than 30%.

6. Hybrid‑Cloud ClusterAutoScale

Increasing available cluster resources does not automatically reduce total spend, prompting the design of a hybrid‑cloud ClusterAutoScale that addresses image‑as‑a‑service, CloudProvider adapters, node initialization, and node reclamation.

Two trigger strategies are used: unschedulable events and resource‑pool watermark thresholds. Challenges solved include private‑cloud CloudProvider API integration, pod CIDR routing announcements, capacity evaluation, resource scattering, and graceful reclamation logic.

7. Practical Considerations

When moving to production, attention must be paid to pool capacity, instance volatility standards per application, distinction between liveness and readiness probes, rationality of metric thresholds and elastic rules, minimal filtering, and avoiding heavy reliance on external platforms.

Middleware Containerization and Hybrid Pooling

1. Business‑Specific Pools and Hybrid Expectations

Redis and Flink resource pools were merged to achieve time‑sharing, eliminating problems such as isolated pools causing resource fragmentation, duplicated external APIs, and inconsistent server specifications.

2. Overall Hybrid‑Pooling Strategy

The approach abstracts factors into three layers: application tiering, hybrid scheduling, and resource QoS. It defines service‑resource guarantee policies and resource allocation algorithms.

Applications are labeled S1‑S4 in a CMDB and mapped to Kubernetes priority labels. Low‑priority workloads are scattered across multiple pools.

3. Resource Allocation Decisions

Request recommendation uses VPA histogram percentiles (P95) multiplied by a watermark factor, combined with elasticity and a cool‑health state machine to smooth spikes.

Actual load scheduling employs ideal‑value weighting and bin‑packing scoring, filtering high‑watermark nodes, aligning pods with ideal watermarks, forecasting node watermarks, and respecting per‑node pod limits.

Resource scattering uses host‑level, zone‑level, and MDU (Maximum Disruption Unit) strategies to achieve the best possible distribution.

4. Outcomes and Remaining Challenges

Resource utilization improved markedly and cost bills decreased, but physical‑machine failures increased the blast radius, raising stability concerns and complicating root‑cause analysis.

Kubernetes Observability and Stability

1. Prometheus‑Based Monitoring Platform

The monitoring stack consists of Thanos + Prometheus for persistent storage, Vertex Exporter for metric collection, SentryD for configuration, CheckD for alert detection, and an Alerts system.

2. Advanced Monitoring Mode

A custom Vertex collector enables fast metric generation, user‑defined metric names, and per‑instance limits of 2 CPU/4 GB for a full physical machine.

3. Event Stream Persistence

An event collector watches all Kubernetes resources, logs every event, and prints response information for special probes.

4. Log Platform

System and business logs are collected, sent to Kafka, and aggregated into a unified platform.

5. Tracing

Trace IDs enable tag‑based filtering, error‑only topology views, and sampled link analysis.

6. Key Stability Indicators

Stability metrics are grouped into five categories: native component availability, cluster capacity watermarks, cluster resource load, abnormal business instances, and cloud‑platform availability.

7. Stability Dashboard

The dashboard visualizes component health, enabling rapid verification of whether a given atomic unit is operating normally, which is crucial for fault analysis and root‑cause localization.

Future Roadmap

Deep hybrid pooling and tuning for large‑scale compute workloads.

Containerization of databases and NoSQL services using cgroup isolation and K8s‑like orchestration.

Exploration of serverless scenarios for algorithm models and data‑job pipelines.

AI‑ops and observability‑driven fault prediction using time‑series forecasting to reduce false alarms and improve detection speed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesmiddlewareAuto Scalinghybrid cloud
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.