Scaling Cloud‑Native Containers at DeWu: Multi‑Cluster Management and Cost Optimization
This article details DeWu's cloud‑native transformation since August 2021, covering multi‑cluster federation, application profiling, custom scheduling plugins, resource pre‑reservation, co‑location of online and offline workloads, cost‑saving hardware choices, multi‑cloud strategy, and the development of the KubeAI platform for AI scenarios.
Introduction
DeWu App's rapid growth required an efficient cloud‑native infrastructure. Starting in August 2021 the team pursued high availability, observability, and operational efficiency while keeping costs under control. The article summarizes the solutions and practices applied during this transformation.
Cloud‑Native Application Management
Management Model
Adopted an OAM‑style abstraction: an “application cluster” maps to a Kruise CloneSet, each Pod is an instance, and “application routing” maps to Ingress/Service. Configuration and feature layers are rendered with Helm to produce Kubernetes resources, simplifying CI/CD and middleware management.
Sidecar containers also handle permission management, mirroring ECS user login rights.
Multi‑Cluster Management
Implemented federation (Karmada/KubeAdmiral) to avoid single‑cluster failure. Host clusters use PropagationPolicies and OverridePolicies to control workload distribution, while Member clusters run a custom MCS‑Controller and MCS‑Validator to keep Service/Endpoint objects consistent across clusters.
Container Scheduling Optimization and Co‑Location
Application Profiling
Historical resource usage is collected via Prometheus. A custom KubeRM service computes a profile value (Pod Request = utilization / safety water‑mark) for CPU, memory, and GPU. These values guide resource specifications for new workloads.
Profiles automatically applied to P3/P4 services.
For other services, profiles are recommended for user acceptance.
Different resource pools can enforce distinct activation strategies.
GPU memory profiles are only recommended, not auto‑applied.
Pricing differentiates between profile‑driven Request billing and non‑profile Limit billing, encouraging users to adopt the recommended values.
Resource Pre‑Reservation
A custom scheduler plugin defines reservation intents via CRDs, preventing high‑priority pods from being blocked by frequent updates or burst scaling.
Balanced Scheduling
Implemented four plugins:
CoolDownHotNode : lowers priority of nodes that recently scheduled pods to avoid hot spots.
HybridUnschedulable : blocks pods using elastic resources from being scheduled on certain nodes.
NodeBalance : balances each node’s CPU request against its profile value.
NodeInfoRt : incorporates real‑time scoring data into scheduling decisions.
Real‑Time Co‑Location
Mixed online services with Flink offline tasks using dedicated BE‑CPU/BE‑Memory resources and binding strategies (LSX, LSR, LS, BE). The binding table defines four application types and their CPU core allocation policies.
Offline Co‑Location (Phase 2)
Introduced “OT” resources to over‑commit BE resources for AI training and data‑processing tasks. Safeguards include host safety water‑mark, CPU‑group priority (offline tasks always lower than online), isolated disks, and night‑time auto‑scaling to free memory for offline workloads.
Elastic Scaling
Developed the KubeAutoScaler component to unify HPA, VPA, and scheduled scaling policies. It collaborates with the profiling system to down‑scale low‑traffic services at night, releasing resources for offline tasks. GPU services use a Queue‑Proxy sidecar to trigger scaling based on traffic thresholds, with an Activator handling cold‑start scaling.
Resource and Cost Governance
Machine Model Replacement
Switched inference from V100 GPUs to cost‑effective A10 GPUs, cutting inference cost by ~20% and improving CPU performance. CPU‑intensive services were migrated from Intel to AMD CPUs, reducing CPU cost by ~14%.
Resource Pool Management
Controlled redundancy based on release cycles, merged clusters by region and purpose, consolidated similar resource pools, and performed fragmentation cleanup through pod re‑scheduling and host re‑allocation.
Workload Specification Governance
Standardized resource specifications: predefined CPU‑memory ratios for CPU workloads and CU units for GPU workloads, with differential billing to align cost with actual usage.
Self‑Built Products
Developed the KubeAI platform to host model training, reducing reliance on external cloud services and enabling unified management of AI workloads.
Multi‑Cloud Strategy
Adopted a multi‑cloud approach to mitigate GPU shortages, improve bargaining power, and meet compliance requirements. Considerations include cross‑region service access, middleware dependencies, and data‑transfer costs.
Cloud‑Native AI Scenario
KubeAI provides end‑to‑end model development, training, inference, and version management, and now offers AIGC/GPT services to accelerate business outcomes.
Outlook
Future work includes further containerizing middleware, refining co‑location and elastic capacity solutions, enhancing Kubernetes stability, and expanding multi‑cloud capabilities to keep the infrastructure flexible and robust as the business scales.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
