Cloud Native 22 min read

Achieving 50% Cost Cut with Cloud‑Native Architecture: A Flexible Workforce Platform Case

Facing poor observability, high resource waste, and unstable releases, QingTuan’s flexible‑workforce platform transformed its monolithic and SOA systems into a cloud‑native micro‑service architecture using Alibaba Cloud ACK, MSE, ARMS, and Prometheus, achieving higher availability, elastic scaling, and up to 50% infrastructure cost reduction.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Achieving 50% Cost Cut with Cloud‑Native Architecture: A Flexible Workforce Platform Case

Architecture Evolution

The system started as a monolithic application (2014), migrated to a Service‑Oriented Architecture (SOA), and finally adopted a Spring Cloud‑based micro‑service architecture. Early micro‑services used Eureka for service discovery and Spring Cloud Gateway as the API gateway.

Evolution from monolith to SOA to micro‑services
Evolution from monolith to SOA to micro‑services

Cloud‑Native Migration

In 2021 the platform was re‑architected on Alibaba Cloud Container Service for Kubernetes (ACK) Serverless. Key migration steps:

Replace Eureka with MSE‑based Nacos for service registration and configuration.

Deploy ACK clusters across multiple Availability Zones (AZs) and use node‑pool isolation to separate business lines.

Leverage CSI plugins ( cloud‑disk, OSS, NAS) for stateful workloads such as databases and Redis.

Enable VPC‑direct networking via the Terway plugin, allowing pods to communicate with existing VPC resources.

Integrate Alibaba Cloud ARMS for Java application performance tracing and MSE for traffic governance.

ACK Serverless architecture diagram
ACK Serverless architecture diagram

Scheduling, Elasticity and Resource Isolation

Three complementary strategies are used:

Multi‑AZ deployment with dedicated node pools for each business line, providing physical‑like isolation.

Horizontal Pod Autoscaling (HPA) combined with custom elastic‑scale policies (e.g., KEDA) to balance cost and stability.

CSI‑based storage for stateful services, ensuring persistent volumes for databases, Redis, etc.

Case 1 – Zero‑downtime rolling update across zones : Services are deployed in Hangzhou H and K zones. Node‑pool labels guide the scheduler to spread replicas. Kubernetes rolling updates with readiness probes guarantee that at least one replica remains healthy while the other is upgraded.

Case 2 – Metric‑driven scaling with KEDA : Business scenarios such as event tracking, ad delivery and peak‑activity campaigns emit Prometheus metrics (e.g., request rate, queue depth). KEDA watches these metrics and triggers the Kubernetes API to add or remove pods automatically.

Elastic deployment across zones
Elastic deployment across zones
KEDA‑driven scaling workflow
KEDA‑driven scaling workflow

Traffic Management

Traffic is controlled at three layers:

Ingress and request routing via Alibaba Cloud APISIX gateway.

Service‑level traffic governance, gray releases, and region‑aware routing using MSE micro‑service engine.

Asynchronous processing and delayed messaging with Kafka and RocketMQ.

A Backend‑For‑Frontend (BFF) layer adapts data formats for C‑end, B‑end, Android and iOS clients before invoking backend services.

Traffic management architecture
Traffic management architecture

Observability and Monitoring

The observability stack combines:

ARMS – Java‑level tracing, latency breakdown, and alerting.

Prometheus – Scrapes custom business metrics exposed by Java clients.

Grafana – Dashboards for visualizing Prometheus data.

Cloud Monitor – Infrastructure metrics for ECS, PolarDB, and message queues.

Typical workflow: a slow third‑party API appears as increased latency in ARMS traces; the corresponding metric spikes in Prometheus trigger an alert; Grafana dashboards help pinpoint the affected service; Cloud Monitor shows whether CPU or memory saturation contributed.

Observability stack diagram
Observability stack diagram

Release Practices

Gray deployment and graceful shutdown are implemented through MSE agents:

During a rolling update, new pods are started first. The MSE agent registers a preStop hook on the old pods; when the hook runs, MSE notifies callers to stop sending traffic to the retiring instance.

For a new version, MSE exposes a health‑check endpoint. Only after the health check passes does MSE gradually shift a small percentage of traffic (e.g., 0.1 %) to the new pods, allowing warm‑up before full traffic is routed.

Results

Infrastructure cost reduced by ~50 % by replacing sparse ECS instances with dense container workloads.

High availability and elastic scheduling support a user base of >73 million.

Comprehensive monitoring shortens MTTR and enables safe gray releases and graceful shutdowns.

Future Directions

Adopt a service‑mesh (MSE or open‑source) to provide language‑agnostic traffic governance for Java, Python, Go, etc.

Explore GraalVM native images to further lower memory footprint and improve cold‑start latency.

Introduce chaos engineering experiments to proactively improve system stability.

Summary and outlook graphic
Summary and outlook graphic
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

architecturecloud-nativeObservability
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.